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ABSTRACT 


In  this  thesis,  we  present  a  recursive  Bayesian  solution  to  the  problem  of  joint  tracking 
and  classification  for  ground-based  air  surveillance.  In  our  system,  we  specifically  al¬ 
low  for  complications  due  to  multiple  targets,  false  alarms,  and  missed  detections.  Most 
importantly,  though,  we  utilize  the  full  benefit  of  a  joint  approach  by  implementing  our 
tracker  using  an  aerodynamically  valid  flight  model  that  requires  aircraft-specific  co¬ 
efficients  such  as  the  wing  area,  minimum  drag,  and  vehicle  mass.  Of  course,  these 
coefficients  are  provided  to  our  tracker  by  our  classifier. 

The  key  feature  that  bridges  the  gap  between  tracking  and  classification  is  radar 
cross  section  (RCS),  which  we  include  in  our  measurement  vector.  By  modeling  the 
true  deterministic  relationship  that  exists  between  RCS  and  target  aspect,  we  are  able 
to  gain  both  valuable  class  information  and  an  estimate  of  target  orientation.  However, 
the  lack  of  a  closed-form  relationship  between  RCS  and  target  aspect  prevents  us  from 
using  the  Kalman  filter  or  any  of  its  variants.  Instead,  we  rely  upon  a  sequential  Monte 
Carlo-based  approach  known  as  particle  filtering.  In  addition  to  allowing  us  to  include 
RCS  as  a  component  in  our  measurement  vector,  the  particle  filter  also  simplifies  the 
implementation  of  our  nonlinear  non-Gaussian  flight  model. 

Thus,  we  believe  that  we  are  the  first  to  provide  a  joint  tracking/classification  frame¬ 
work  that  realizes  the  full  potential  of  such  an  approach.  Our  joint  formulation  consists 
of  three  key  developments:  (1)  an  aerodynamically  valid  flight  model  that  relies  upon 
aircraft-specific  coefficients  such  as  the  wing  area  and  the  minimum  value  of  drag, 
(2)  an  electromagnetically  correct  model  for  RCS  that  yields  information  pertaining  to 
both  class  identity  and  target  orientation,  and  (3)  a  particle  filter-based  implementation 
that  takes  into  account  realistic  difficulties  caused  by  multiple  targets,  false  alarms,  and 
missed  detections. 
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CHAPTER  1 


INTRODUCTION 


In  the  most  general  sense,  the  two  questions  that  a  surveillance  system  tries  to  answer 
are  “Where  is  it?”  and  “What  is  it?”  The  question  of  “Where?”  is  a  matter  of  tar¬ 
get  tracking.  The  question  of  “What?”  is  a  matter  of  classification.  In  recent  years, 
it  has  been  suggested  in  the  literature  that  tracking  and  classification  should  proceed 
simultaneously,  or  jointly,  whenever  possible  [1-6].  While  it  might  seem  that  a  joint 
approach  to  tracking  and  classification  could  do  no  worse  than  separate  implementa¬ 
tions  of  each,  this  is  not  necessarily  the  case.  Furthermore,  significant  improvements  in 
performance  require  the  two-way  exchange  of  useful  information  between  the  tracker 
and  classifier.  Broadly  speaking,  the  kinematic  output  of  the  tracker  (e.g.,  position  and 
velocity)  should  improve  the  performance  of  the  classifier,  just  as  the  target  identity 
supplied  by  the  classifier  should  improve  track  accuracy.  As  a  means  of  introduction, 
we  will  consider  two  existing  approaches  to  joint  tracking/classification.  We  will  show 
why  both  techniques  fall  short  of  delivering  the  full  benefit  of  a  joint  approach.  Then, 
we  will  demonstrate  how  the  joint  formulation  presented  in  this  thesis  is  able  to  deliver 
upon  its  potential. 

Before  discussing  the  two  existing  approaches  to  joint  tracking/classification,  it  is 
helpful  to  introduce  the  actual  application  that  we  have  in  mind.  In  this  work,  we 
are  primarily  interested  in  designing  a  ground-based  passive  radar  system  for  tracking 
and  classifying  aircraft.  While  the  assumption  that  the  radar  is  passive  does  limit  the 
type  and  quality  of  data  available  to  us,  it  does  not  impact  our  present  discussion. 
For  now,  it  is  enough  to  focus  on  the  broader  task  of  air  surveillance.  It  is  important, 
though,  to  specify  what  we  mean  by  classification.  For  our  classifier,  we  are  specifically 
interested  in  identifying  the  model  of  an  aircraft  (e.g.,  F-15  orT-38),  as  opposed  to  other 
systems  whose  labels  correspond  to  broad  classifications  such  as  “fighter/airliner”  or 
‘  ‘friend/neutral/foe .” 

As  it  happens,  the  distinction  between  specific  class  labels  (e.g.,  F-15)  and  broad 
class  labels  (e.g.,  fighter/airliner)  is  the  primary  difference  between  current  approaches 
to  joint  tracking/classification.  In  systems  that  label  targets  broadly,  classification  is 
often  based  upon  class-dependent  kinematic  models  [3,4,6].  For  example,  in  [3], 
the  target  was  assumed  to  belong  to  one  of  two  possible  classes.  The  first  class  was 
defined  for  highly  agile  airplanes  such  as  fighters.  The  second  class  was  defined  for 
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lower  speed,  less  maneuverable  targets  such  as  commercial  airliners.  The  difficulty 
with  designations  such  as  these  is  that  they  are  not  mutually  exclusive.  Thus,  there  is 
nothing  to  stop  a  highly  maneuverable  target  such  as  a  fighter  from  flying  slow  and 
straight,  in  which  case  it  will  be  classified  by  the  system  as  an  airliner. 

The  second  approach  to  joint  tracking/classification  includes  at  least  one  feature 
within  its  set  of  measurements  with  the  potential  to  discriminate  between  specific  target 
types.  For  example,  in  [1]  the  authors  suggested  using  high-resolution  range  profiles; 
in  [2, 5],  radar  cross  section  (RCS)  was  chosen.  By  not  relying  on  kinematic  informa¬ 
tion  alone,  the  classifier  has  the  potential  to  be  much  more  accurate.  In  all  three  papers, 
the  claim  was  made  that  knowledge  of  the  specific  target  type  could  be  used  to  improve 
tracking  performance.  For  example,  in  [2,5],  a  rigid-body  motion  model  was  used  to 
represent  target  dynamics.  The  forces  and  torques  that  cause  the  translational  and  rota¬ 
tional  motion  were  modeled  as  independent  zero-mean  Gaussian  random  processes.  It 
was  then  suggested  that  the  covariances  of  these  processes  could  be  varied  based  upon 
the  output  of  the  classifier.  Thus,  the  covariance  matrix  for  an  aircraft  that  was  highly 
maneuverable  would  be  “larger”  than  an  aircraft  with  low  maneuverability.  However, 
this  can  cause  the  same  sort  of  problem  as  broad  class  labels.  Namely,  what  happens  if 
a  highly  maneuverable  target  does  not  maneuver?  In  this  case,  the  covariance  matrices 
of  the  random  forces  and  torques  are  poorly  matched  to  the  target’s  motion,  and  the 
influence  of  the  classifier  could  actually  degrade  tracking  accuracy. 

At  this  point,  a  natural  question  might  be  whether  there  is  any  aircraft-specific  in¬ 
formation  that  could  improve  tracking  performance.  The  answer  is  yes,  but  it  actually 
requires  a  new  perspective  on  tracking.  In  [7, 8]  the  authors  suggested  a  motion  model 
that  used  actual  aerodynamic  equations  for  flight.  In  both  papers,  it  was  noted  that,  in 
flight,  the  predominant  acceleration  is  caused  by  the  lift  force  on  the  airframe.  Further¬ 
more,  it  was  pointed  out  that  the  magnitude  of  the  lift  force  was  proportionate  to  the 
target’s  speed  squared  while  the  direction  of  the  force  was  nearly  normal  to  the  target’s 
velocity  vector  and  wings.  This  is  a  substantial  departure  from  the  common  assump¬ 
tion  that  target  accelerations  along  each  coordinate  dimension  are  uncoupled  and  can 
be  modeled  as  either  white  Gaussian  noise  or  Brownian  motion.  There  are  two  diffi¬ 
culties  with  the  aerodynamic  ally  motivated  models  in  [7, 8]  though.  First,  to  determine 
the  magnitude  of  the  lift  force  requires  knowledge  of  a  constant  of  proportionality  that 
is  aircraft-specific.  Second,  to  determine  the  direction  of  the  lift  force  requires  knowl¬ 
edge  of  the  plane’s  angular  orientation.  Because  neither  paper  addressed  the  problem 
of  classification,  the  constant  of  proportionality  between  lift  and  speed  squared  had  to 
be  estimated  adaptively.  Furthermore,  the  authors  just  assumed  that  an  estimate  of  the 
target’s  orientation  could  be  acquired  somehow. 

With  this  as  motivation,  we  can  now  state  the  primary  contribution  of  this  work. 
In  this  thesis,  we  present  a  recursive  Bayesian  solution  to  the  problem  of  joint  track¬ 
ing/classification  for  ground-based  air  surveillance.  In  our  design,  we  utilize  the  full 
benefit  of  a  joint  approach  by  implementing  our  tracker  using  an  aerodynamic  ally  valid 
flight  model  similar  to  [8].  However,  in  our  case,  aircraft-specific  coefficients  such  as 
the  wing  area,  minimum  drag,  and  vehicle  mass  are  available  to  the  tracker  because 


2 


of  our  joint  formulation.  In  addition,  we  specifically  allow  for  such  complications  as 
multiple  targets,  false  alarms,  and  missed  detections. 

The  key  feature  that  makes  our  system  possible  is  inclusion  of  radar  cross  section  as 
one  of  our  measurements.  However,  instead  of  modeling  RCS  using  a  Swerling  model 
(i.e.,  as  a  random  variable  drawn  from  an  exponential  distribution  [4]),  we  directly 
account  for  the  complex  relationship  that  exists  between  RCS  and  variables  such  as 
target  geometry,  position,  and  angular  orientation.  By  treating  RCS  as  a  deterministic 
(albeit  complex)  function  plus  noise,  we  gain  two  crucial  benefits.  First,  we  are  able  to 
use  the  time  evolution  of  a  target’s  radar  cross  section  as  the  primary  feature  for  class 
discrimination.  Second,  we  are  able  to  track  a  target’s  time-varying  angular  orientation, 
which  is  required  by  our  flight  model.  However,  accommodating  the  complex  nature 
of  RCS  does  not  come  without  a  penalty.  Namely,  the  lack  of  a  closed-form  expression 
for  RCS  prevents  us  from  using  an  extended  Kalman  filter  or  any  of  its  variants  for  state 
estimation.  Instead,  we  rely  upon  a  sequential  Monte  Carlo-based  approach  known  as 
particle  filtering  to  perform  the  needed  inference  [9-15],  In  addition  to  allowing  us  to 
include  RCS  as  a  component  of  our  measurement  vector,  we  will  also  show  that  particle 
filtering  simplifies  the  implementation  of  our  nonlinear  non-Gaussian  flight  model. 

In  summary,  although  others  have  suggested  that  tracking  and  classification  should 
be  performed  jointly,  we  believe  that  we  are  the  first  to  offer  a  solution  that  realizes 
the  full  potential  of  such  a  coupling.  Our  joint  formulation  consists  of  three  key  devel¬ 
opments:  (1)  an  aerodynamically  valid  flight  model  that  relies  upon  aircraft-specific 
coefficients  such  as  the  wing  area  and  the  minimum  value  of  drag,  (2)  an  electro- 
magnetically  correct  model  for  RCS  that  yields  information  pertaining  to  both  class 
identity  and  target  orientation,  and  Oja  particle  filter-based  implementation  that  takes 
into  account  realistic  difficulties  caused  by  multiple  targets,  false  alarms,  and  missed 
detections. 

This  thesis  is  organized  as  follows.  Chapter  2  presents  relevant  aspects  of  scatter¬ 
ing  theory  including  the  definition  of  radar  cross  section.  It  highlights  the  complex 
relationship  between  target  aspect  and  RCS.  In  Chapter  3,  we  introduce  the  stochas¬ 
tic  framework  in  which  we  will  formulate  our  joint  tracking/classification  problem.  We 
also  demonstrate  how  the  inclusion  of  RCS  as  a  data  feature  prevents  us  from  using  any 
variant  of  Kalman  filtering.  In  Chapter  4,  we  introduce  sampling-based  approaches  to 
sequential  estimation,  with  an  emphasis  on  importance  sampling  and  the  particle  fil¬ 
ter.  We  demonstrate  that  the  flexibility  of  the  particle  filtering  algorithm  allows  us  to 
cast  our  joint  tracking/classification  problem  in  the  desired  recursive  Bayesian  frame¬ 
work.  Next,  in  Chapter  5,  we  present  a  tracker  based  on  the  extended  Kalman  filter  that 
will  serve  as  a  benchmark  for  comparison  when  we  present  our  experimental  results. 
Chapter  6  contains  the  full  details  of  our  system,  including  our  flight  model  and  our 
measurement  equation  for  RCS.  Next,  extensive  experimental  results  are  presented  in 
Chapter  7.  The  goal  of  this  chapter  is  twofold.  First,  we  demonstrate  the  significant 
improvement  in  tracking  accuracy  that  can  be  achieved  through  a  joint  formulation 
with  an  aerodynamically  correct  flight  model.  Second,  we  explore  the  capabilities  of 
particle  filter-based  systems  in  the  presence  of  data  uncertainty  due  to  false  alarms  and 
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multiple  targets.  Finally,  we  conclude  in  Chapter  8.  Chapters  2-A  provide  the  neces¬ 
sary  background  for  our  formulation.  Chapters  6  and  7  contain  the  main  contributions 
of  this  work. 
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CHAPTER  2 


BACKGROUND  IN 
ELECTROMAGNETICS 


The  interaction  between  signal  and  target  in  radar  systems  is  properly  addressed  using 
electromagnetic  (EM)  scattering  theory.  In  this  section,  we  present  an  overview  of  the 
EM  concepts  pertinent  to  our  application.  As  a  means  of  motivating  this  discussion, 
we  begin  by  introducing  the  operational  scenario  that  we  will  simulate  in  Chapter  7 : 
ground-based  FM-band  passive  air  surveillance.  A  discussion  of  FM-band  passive 
radar  then  leads  naturally  to  the  consideration  of  radar  cross  section  as  a  potential 
feature  for  classification. 


2.1  Ground-Based  FM-Band  Passive  Radar 

The  majority  of  modern  radar  systems  are  active.  An  active  radar  both  transmits  and  re¬ 
ceives  electromagnetic  signals  for  the  purpose  of  detection  and  estimation.  In  contrast, 
a  passive  radar  system  does  not  transmit  electromagnetic  energy  of  its  own;  instead,  it 
relies  on  “illuminators  of  opportunity”  such  as  commercial  radio  or  television  broad¬ 
casts.  Regardless  of  the  source  or  nature  of  transmission,  both  types  of  radar  attempt 
to  measure  scattering  of  these  signals  in  order  to  detect  and  locate  targets  in  the  nearby 
airspace.  We  will  refer  to  a  single  pass  made  by  a  radar  in  search  of  new  or  existing 
targets  as  a  scan.  For  convenience,  we  assume  that  scans  are  performed  every  At  sec¬ 
onds.  As  such,  the  search  performed  by  the  radar  at  time  kAt  will  be  referred  to  as 
scan  k.  The  result  of  each  scan  is  a  set  of  detections  and  their  associated  measurements 
(e.g.,  delay  and  Doppler  shift).  Detections  across  consecutive  scans  that  are  determined 
to  have  originated  from  the  same  target  are  associated  and  referred  to  as  a  track.  Mea¬ 
sured  data  corresponding  to  a  given  track  can  then  be  filtered  to  produce  an  estimate  of 
the  unknown  flight  path. 

Although  most  radar  systems  are  active,  passive  radar  can  provide  some  advan¬ 
tages,  especially  in  military  contexts  [2, 16, 17],  Because  an  active  radar  advertises  its 
location  through  its  transmissions,  it  can  quickly  become  a  target  itself.  A  passive  radar, 
on  the  other  hand,  offers  the  primary  benefits  of  survivability  and  robustness  against  de¬ 
liberate  directional  interference.  Second,  because  commercial  transmitters  focus  their 
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energy  toward  the  Earth’s  surface,  passive  radar  permits  illumination  of  low-flying  air¬ 
craft.  Finally,  the  signals  used  by  passive  radar  typically  lie  in  the  VHF/UHF  frequency 
ranges  (55-885  MHz).  At  these  frequencies,  stealth  measures  such  as  target  shaping 
and  the  use  of  radar  absorbing  material  (RAM)  are  much  less  effective.  For  example, 
it  has  been  found  that  the  ability  of  surface  faceting  to  deflect  signal  energy  away  from 
the  radar  is  only  effective  if  the  dimensions  of  a  single  facet  are  large  compared  to  the 
wavelength.  At  wavelengths  on  the  order  of  a  single  facet  or  larger,  the  strength  of  the 
radar  return  becomes  dependent  on  the  target’s  volume  rather  than  its  fine  structure. 
Similarly,  radar  absorbing  material  must  be  matched  to  the  wavelength  of  the  incident 
radiation  as  well.  For  example,  dielectric  absorbers  typically  need  to  be  applied  in 
layers  that  are  0.01-0.1  A  thick,  where  A  is  the  wavelength  of  the  radar  signal.  At 
VHF/UHF  frequencies,  this  is  generally  too  thick  to  be  used  on  air  vehicles  because  it 
would  change  their  aerodynamic  properties  [18], 

The  main  disadvantage  of  passive  radar  is  a  matter  of  signal  design.  As  mentioned, 
a  passive  radar  gathers  information  about  targets  using  commercial  broadcasts,  most 
of  which  are  far  from  optimal  for  use  in  tracking.  To  understand  why,  consider  that  an 
active  radar  estimates  range  by  measuring  the  time  delay  between  pulse  transmission 
and  reception  of  the  target  echo.  In  contrast,  commercial  broadcasts  are  not  pulsed 
waveforms;  they  are  continuous-wave  signals  that  may  be  considered  to  be  “always 
on.”  In  the  case  of  a  passive  radar  using  FM  broadcasts,  target  range  is  estimated  by 
correlating  the  modulation  waveform  of  the  direct-path  FM  signal  with  the  modulation 
waveform  of  the  scattered  signal.  As  such,  the  accuracy  of  this  range  measurement, 
or  range  resolution,  is  related  to  the  maximum  excursion  of  the  frequency  modulation. 
More  generally,  it  can  be  shown  that  the  achievable  range  resolution  is  inversely  pro¬ 
portionate  to  the  bandwidth  of  the  illuminating  signal  [19],  Because  a  chirp  waveform 
used  by  an  active  radar  can  have  a  bandwidth  that  is  10  000  times  larger  than  that  of  an 
FM  radio  broadcast,  the  challenge  inherent  in  passive  radar  is  clear. 

Having  established  that  the  tracking  performance  of  passive  radar  may  be  substan¬ 
tially  worse  than  that  of  its  active  counterpart,  a  logical  question  might  be,  “Are  there 
any  advantages  to  using  passive  radar  for  nonmilitary  applications?”  As  it  turns  out, 
the  FM  frequency  band  is  well-suited  to  target  classification.  Before  we  can  explain 
why  this  is  so,  we  first  must  introduce  the  concept  of  radar  cross  section. 

2.2  Radar  Cross  Section 

In  radar  applications,  we  are  often  interested  in  how  well  a  target  captures  and  reradi¬ 
ates  electromagnetic  energy.  Radar  cross  section  (RCS)  is  a  measure  of  the  magnitude 
of  this  reflection  process  expressed  as  an  effective  area.  RCS  is  a  function  of  the  size, 
shape,  and  composition  of  the  target  as  well  as  the  polarization  and  frequency  of  the 
incident  wave.  RCS  also  depends  on  the  position  and  orientation  of  the  target  with 
respect  to  both  the  source  of  the  incident  wave  (e.g.,  the  FM  transmitter)  and  the  mea- 
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surement  location  (i.e.,  the  radar  receiver).  RCS  is  formally  defined  as 


CT(kj,ks) 


lim  4nR2 

R— >oo 


|Es(k5)|2 
|E*(kj)|2  ’ 


(2.1) 


where  |E®|  is  the  amplitude  of  the  incident  wave  measured  at  the  target,  and  |ES|  is 
the  amplitude  of  the  scattered  wave  measured  at  a  distance  R  from  the  target  [20]. 
The  variables  kj  and  ks  denote  the  incident  and  scattered  wavevectors,  respectively, 
expressed  in  a  coordinate  system  that  is  centered  on  the  target. 1  The  limiting  process 
ensures  that  the  scattered  signal  is  measured  in  the  far-held.  The  incident  wave  in  the 
RCS  calculation  is  commonly  taken  to  be  a  plane  wave.  This  assumption  is  justified 
when  the  distance  between  the  transmitter  and  the  target  is  large  because  spherical 
phase  fronts  can  then  be  approximated  as  locally  planar. 

Because  RCS  is  a  function  of  size,  shape,  and  composition  of  the  target,  it  has 
potential  as  a  feature  for  classification.  However,  a  radar  cannot  measure  RCS  directly. 
Instead,  it  measures  the  power  Pr  of  the  received  waveform,  which  is  related  to  RCS 
through  the  radar  range  equation 


(4tt  fR?tR2r 
Gt  Pt  Gr  A2 


(2.2) 


where  P(  is  the  transmitted  power,  A  is  the  wavelength,  Rf  is  the  transmitter-to-target 
range,  and  Rr  is  the  receiver-to-target  range.2  Gt  and  Gr  are  antenna  gains  for  the 
transmitter  and  receiver,  respectively.  The  fraction  in  (2.2)  can  be  viewed  as  the  nor¬ 
malization  term  needed  to  remove  the  effects  of  transmission  and  propagation  from 
the  received  power  measurement,  leaving  a  quantity  (RCS)  that  depends  on  the  target 
alone.  It  is  reasonable  to  assume  that  Pt ,  Gt ,  and  Gr  are  either  known  quantities  or 
may  be  determined  through  suitable  calibration.  The  two  distances,  Rt  and  Rr,  must  be 
approximated  using  an  estimate  of  the  target’s  position  (which  is  available,  of  course, 
in  a  joint  tracking/classification  framework). 

Now,  we  return  to  our  claim  that  FM  radio  frequencies  are  well-suited  for  classifi¬ 
cation.  As  we  have  just  seen,  the  RCS  of  a  target  varies  as  its  position  and  orientation 
with  respect  to  the  transmitter  or  receiver  changes.  To  ensure  robust  classification  in 
the  presence  of  noise,  which  may  induce  errors  in  our  estimates  of  position  and  orien¬ 
tation,  it  is  helpful  that  RCS  vary  “slowly”  with  small  changes  in  these  components  of 
the  target’s  state.  The  variation  in  RCS,  as  reflected  by  the  number  of  nulls  encountered 
as  a  target’s  aspect  changes,  is  proportionate  to  the  maximum  dimension  of  the  target 
in  wavelengths.  At  FM-band  frequencies  (100  MHz),  a  fighter-sized  aircraft  is  approx¬ 
imately  five  wavelengths  long.  In  contrast,  at  X-band  frequencies  used  by  active  radars 
(10  GHz),  the  same  aircraft  would  be  500  wavelengths  long!  Therefore,  although  VHF 
frequencies  yield  poor  range  resolution,  they  have  the  potential  for  robust  classification 
through  measurement  of  RCS  [2 1 , 22] . 

1  For  notational  simplicity,  we  do  not  indicate  the  dependence  of  RCS  on  frequency  and  polarization  in 

(2.1). 

-'the  wavelength  used  in  (2.2)  corresponds  to  the  carrier  frequency  of  the  broadcast,  because  that  is  where 
most  of  the  transmitted  power  is  concentrated. 
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2.3  Method  of  Moments 


We  have  just  shown  that  radar  cross  section  may  be  a  viable  feature  for  classification 
in  a  passive  radar  system.  However,  in  order  to  use  RCS  (as  measured  by  received 
power),  we  will  need  to  know  the  expected  RCS  for  each  target  type  that  we  wish  to 
classify,  over  the  entire  range  of  frequencies  and  aspect  angles  that  may  be  encoun¬ 
tered.  According  to  (2.1),  this  amounts  to  finding  the  scattered  far-held  Es(ks),  given 
the  incident  frequency  and  E*(k,  ).  To  accomplish  this,  the  current  Js  induced  by  the 
incident  wave  on  the  surface  of  the  target  must  be  found.  Unfortunately,  analytic  ex¬ 
pressions  for  Js  can  only  be  found  for  the  simplest  target  geometries.  Instead,  we  must 
rely  on  either  empirical  or  numeric  techniques.  In  this  section,  we  will  discuss  one  of 
the  most  popular  approaches  to  numerically  estimating  the  RCS  of  a  target.  Our  goal  is 
to  highlight  the  complicated  relationship  that  exists  between  target  state  (e.g.,  position 
and  orientation)  and  RCS. 

The  method  of  moments  (MoM)  is  a  numeric  technique  that  has  found  widespread 
use  in  the  solution  of  scattering  problems  involving  complex  targets  [23].  The  method 
of  moments  applies  to  general  linear-operator  equations,  such  as 


£r  =  e,  (2.3) 

where  £  is  a  linear  operator,  r  is  an  unknown  response,  and  e  is  a  known  excitation. 
The  unknown  response  r  is  expanded  as  a  sum  of  basis  functions,  r  =  J^jLi  ajrj-  To 
solve  for  the  N  unknowns  {ai, . . . ,  ajv},  we  write  N  linearly  independent  equations. 
These  are  obtained  by  taking  the  inner  product  of  (2.3)  with  a  set  of  N  testing  functions 
{yii  ‘  ■  ■  i  Vn}, 


N 

{Vi,£r)  =  ^2aj{yi,£rj )  =  (: yi}e },  i  =  l,...,N.  (2.4) 

j= 1 

This  equation  can  be  written  in  matrix  notation  as  Z  a  =  e  where  Zl3  =  (y  , .  jCrj ),  a  is 
a  column  vector  of  the  unknown  coefficients,  and  e*  =  (y,-.  e).  Matrix  inversion  of  Z 
yields  the  desired  basis  coefficients:  a  =  Z~x  c. 

In  our  electromagnetic  scattering  application,  jC  is  an  integrodifferential  operator 
derived  from  one  of  the  boundary  conditions  of  Maxwell’s  equations,  e  is  an  incident 
field  term  (either  electric  or  magnetic),  and  r  is  an  approximation  of  Js.  The  shape 
of  the  target  is  provided  to  the  MoM  code  by  a  CAD  file  that  partitions  the  surface 
of  the  target  into  smooth  patches  or  planar  facets.  Both  the  basis  functions  {/■.; }  and 
the  testing  functions  {y,}  are  then  defined  in  terms  of  the  facetization  of  the  target. 
A  popular  choice  of  basis  is  defined  on  pairs  of  adjacent  triangular  facets  [24].  The 
so-called  Rao-Wilton-Glisson  (RWG)  basis  is  designed  to  maintain  continuity  of  the 
normal  component  of  Js  across  facet  boundaries,  thereby  avoiding  the  need  for  ficti¬ 
tious  line  and  point  charges.  A  popular  choice  of  testing  function  is  yi  =  r*,  commonly 
referred  to  as  Galerkin’s  method.  If  a  target  is  modeled  as  a  closed  surface,  use  of  the 
RWG  basis  requires  one  function  per  triangular  facet  edge,  yielding  a  number  of  un- 


knowns  proportionate  to  the  number  of  facets  in  the  CAD  representation.  Because  the 
level  of  facetization  is  dictated  by  the  length  of  the  target  in  wavelengths,  an  accurate 
representation  of  the  surface  current  on  a  fighter-sized  aircraft  at  gigahertz  frequencies 
may  require  millions  of  facets.3 

Unfortunately,  with  this  many  unknowns,  it  is  impractical  to  invert  the  matrix  Z.  In 
cases  such  as  these,  a  conjugate  gradient  technique  can  be  used  to  replace  the  0(N3) 
matrix  inversion  with  an  0(KN2)  iterative  approach  requiring  only  matrix-vector  mul¬ 
tiplies.  (K  is  the  number  of  iterations  required  for  convergence.)  If  the  amount  of 
computation  is  still  unwieldy,  the  Multilevel  Fast-Multipole  Algorithm  (MLFMA)  can 
be  used  to  reduce  the  number  of  operations  in  the  matrix-vector  multiply,  resulting  in 
O(KNlogN)  computation  [25],  An  example  of  one  such  MLFMA-based  software 
package  is  Fisc  (Fast  Illinois  Solver  Code)  [26].  FlSC  finds  Js  as  described  above  and 
then  solves  for  the  scattered  electric  field  using  the  far-held  radiation  integral 

ES(ks)  ~  iflA  e~jkR  I  Is  (Js  “  (*s ' Js)  is) eikaI  dS ’  (2'5) 

where  //  is  the  permeability  of  the  medium,  c  is  the  speed  of  light,  r  is  the  location 
in  a  target-centered  coordinate  system  of  an  infinitesimal  patch  on  the  surface  of  the 
scatterer,  and  is  is  the  unit  vector  in  the  direction  of  ks.  The  wavenumber  k  is  just  the 
magnitude  of  ks ,  k  =  2n/X. 

It  is  beneficial  to  pause  at  this  point  in  our  discussion  and  draw  attention  to  the 
fact  that  there  is  no  closed-form  relationship  between  a  target’s  aspect  and  its  resulting 
RCS.  Thus,  even  though  an  extended  Kalman  filter  can  handle  nonlinear  relationships 
between  state  (target  aspect)  and  measurement  (RCS),  that  relationship  must  still  be 
made  explicit.  In  the  case  of  RCS,  the  closest  we  could  come  would  involve  storing 
the  entire  admittance  matrix  (Z~x)  and  manipulating  a  discretized  version  of  (2.5),  an 
approach  that  is  clearly  impractical  for  real-time  operation. 

Up  until  now,  we  have  overlooked  polarization  in  our  discussion.  However,  it  must 
be  taken  into  account  in  order  to  understand  how  EM  solvers  like  FlSC  calculate  RCS. 
Recall  that  E*  specifies  the  amplitude,  phase,  and  orientation  of  the  incident  electric 
field  at  each  point  in  space.  Because  RCS  is  defined  as  a  ratio  of  magnitudes  and  Es 
is  related  to  E*  through  integrodifferential  equations,  the  amplitude  and  phase  of  the 
incident  field  are  arbitrary.  As  such,  FlSC  sets  |E*|  =  1.  The  only  parameter  of  the 
incident  field  that  matters  then  is  its  orientation  (and  frequency).  If  we  assume,  in  the 
RCS  computation,  that  E*  is  a  uniform  plane  wave  traveling  through  free  space,  the 
phase  fronts  are  then  planes  perpendicular  to  the  direction  of  propagation,  E*  •  k*  =  0. 
As  such,  E®  can  be  decomposed  into  two  orthogonal  components, 

E*  =  Efig  +  (2.6) 

where  (6,  <f>)  represent  the  incident  direction  in  a  spherical  coordinate  system  centered 
on  the  target  and  (ig,  i^)  are  spherical  unit  vectors.  Because  the  system  relating  E* 

3  As  a  rule  of  thumb,  the  maximum  length  of  a  facet  edge  should  be  between  0.1  and  0.2  wavelengths. 
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and  Js  is  linear,  FlSC  only  needs  to  compute  Js  for  E*  =  ig  and  E*  =  the  current 
distribution  due  to  all  other  incident  orientations  can  be  synthesized  from  these  two 
results,  ig  and  ilfl  are  typically  referred  to  as  “vertical”  and  “horizontal”  polarizations, 
respectively. 

In  an  analogous  fashion,  the  scattered  electric  field  can  be  expressed  as  a  sum  of 
orthogonal  components, 

Es  =  Eg,  igi  +  E^j  ip ,  (2.7) 

where  the  spherical  angles  (61 ,  cf>')  in  (2.7)  are  primed  because  they  correspond  to  the 
scattered  direction,  which  is  not  necessarily  the  same  as  the  incident  direction.  If 
ks  =  — k,:,  the  spherical  angles  will  be  the  same  and  the  scenario  is  termed  monos¬ 
tatic  scattering.  The  general  case  is  referred  to  as  bistatic  scattering.  We  see  that,  given 
the  incident  and  scattered  directions,  there  are  only  four  coefficients  that  are  needed  to 
calculate  Es  for  any  E*  satisfying  E*  ■  k,  =  0.  These  four  quantities  constitute  entries 
in  a  scattering  matrix  5, 
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where  we  have  adopted  the  popular  notation  of  using  ’v’  and  ’h’  to  denote  vertical 
and  horizontal  polarizations.  Because  each  element  of  5  varies  with  k,  and  ks,  the 
scattering  matrix  is  actually  a  collection  of  four  complex-valued  functions.  The  square 
magnitude  of  each  coefficient  yields  the  corresponding  RCS  (e.g.,  ovi,  =  |s„/j  |2)-  It  is 
specifically  these  scattering  coefficients  that  FlSC  calculates. 

RCS  plots  for  two  different  aircraft  are  shown  in  Figure  2.1.  Note  that  FlSC  speci¬ 
fies  the  direction  of  wave  propagation  using  elevation  and  azimuth  angles  (el,  az)  in  a 
target-centered  coordinate  system.  These  angles  are  related  to  the  spherical  coordinate 
system  by  EL  =  90°  —  8  and  AZ  =  —<f>.  An  example  of  positive  elevation  and  azimuth 
angles  is  provided  in  Figure  2.2.  In  order  to  plot  bistatic  RCS  in  two  dimensions,  we 
have  set  both  incident  and  scattered  elevation  angles  to  zero.  The  plots  in  Figure  2.1 
depict  Ohh ,  the  RCS  term  corresponding  to  the  Shh  scattering  coefficient. 

2.4  RCS  Databases  and  Sampling  Considerations 

In  the  previous  section,  we  discussed  how  a  software  package  such  as  FlSC  can  be 
used  to  numerically  approximate  the  scattering  matrix  for  a  complex  target  such  as  an 
aircraft.  For  each  target  class  of  interest,  we  would  need  to  use  an  EM  solver  code  such 
as  FlSC  to  generate  a  database  of  RCS  values  (off-line,  prior  to  tracking)  corresponding 
to  various  incident  and  scattered  directions.  We  could  then  estimate  RCS  along  an 
arbitrary  flight  path  by  interpolating  between  database  entries. 

The  incident  wave’s  direction  can  then  be  specified  by  the  pair  (el*,  AZ*)  while 
the  scattered  wave  direction  is  indicated  by  (els,  AZs).  When  computing  RCS  values 
for  more  than  one  pair  of  incident  and  scattered  directions,  FlSC  imposes  the  require- 
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Figure  2.1:  HH-polarized  bistatic  RCS  for  a  VFY-218  fighter  and  a  Falcon-20  com¬ 
mercial  jet  at  99.9  MHz  and  zero  degrees  incident  and  observed  elevation.  The  color 
scale  for  both  plots  (from  blue  to  red)  is  [—26.55,  35.63]  dBsm. 
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(a)  (b) 

Figure  2.2:  (a)  Example  of  a  positive  elevation  angle  (EL)  on  the  Cartesian  axes,  (b) 
Example  of  a  positive  azimuth  angle  (AZ)  on  the  Cartesian  axes. 
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ment  that  all  angular  values  be  separated  by  multiples  of  fixed  elevation  and  azimuth 
increments.  Define  Af  as  the  fixed  frequency  increment,  Ao  as  the  fixed  elevation  in¬ 
crement,  and  Alfl  as  the  fixed  azimuth  increment.  In  this  section,  we  will  consider  how 
to  choose  these  three  values,  with  the  intent  of  minimizing  the  size  of  each  database  by 
using  the  largest  increments  possible. 

To  gain  insight  into  this  choice  of  increments,  we  will  assume  that  our  target  can 
be  modeled  as  a  collection  of  ideal  scattering  centers,  each  with  complex  amplitude  an 
and  position  r„  =  [xn,  yn,  zn ]'  in  a  target-centered  coordinate  system  with  unit  vectors 
ix,  iy ,  and  iz  defining  the  axes  [27],  This  model  is  appropriate  when  the  wavelength  of 
the  incident  field  is  small  relative  to  the  target  dimensions.  If  we  also  assume  that  each 
scattering  center  is  visible  from  all  observation  angles,  the  monostatic  scattering  from 
the  target  can  be  written  as 


G(/,M)  =Ea-e"j2k'r"'  (2-9) 

n 

If  we  substitute  the  expression  for  the  wavevector 

k  =  ^  ^sin#cos<^ix  +  sin9sin(j)iy  +  cos 9iz^j  (2.10) 

into  (2.9),  we  arrive  at 

G(/,M)  =  Yjane-^Xx-+Y^+Zz-\  (2.11) 

n 

where 

2  f  2  f  2  f 

X  =  — —  sin  0  cos  cf>]  Y  =  — —  sin  9  sin  <j>,  Z=—cos9,  (2.12) 
c  c  c 

and  c  is  the  speed  of  light.  The  capitalized  notation  highlights  the  fact  that  (x.  y.  z)  and 

(X.  Y,  Z)  form  a  Fourier  transform  pair.  Because  building  an  RCS  database  amounts 

to  uniform  sampling  of  G(f,  9,<j>),  a  reasonable  choice  for  stepsizes  would  be  ones  that 

satisfy  the  Nyquist  criterion  for  G.4 

To  enforce  the  Nyquist  criterion,  we  need  to  make  some  assumptions  about  the  spa¬ 
tial  extent  of  the  target  as  represented  by  the  scattering  centers  {r„}.  We  assume  that 
our  target  is  oriented  so  that  its  nose  points  along  ix,  and  the  right  wing  (from  the  pi¬ 
lot’s  perspective)  extends  along  iy.  Furthermore,  the  center  of  the  smallest  rectangular 
volume  enclosing  the  target  is  taken  to  be  the  origin  of  the  ( x ,  y,  z )  coordinate  system. 
Finally,  we  also  assume  that  all  of  the  scattering  centers  {r  „  }  are  contained  within  this 
rectangular  bounding  box.  With  these  assumptions,  the  Nyquist  criterion  requires  that 
the  stepsizes  for  uniform  sampling  satisfy 

Ax  <  y,  AY  <  and  Az  <  — ,  (2.13) 

_ L  W  H 

4Building  a  bistatic  RCS  database  actually  amounts  to  uniformly  sampling  the  function 
G(f,  0l ,  (j)1 , 0s,  (ps).  However,  it  can  be  shown  that  the  stepsizes  that  satisfy  the  Nyquist  criterion  in  the 
monostatic  scenario  are  lower  bounds  for  any  combination  of  incident  and  scattered  directions. 
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where  L,  W,  and  H  are  the  length,  wingspan,  and  height,  respectively,  of  the  target.  In 
order  to  translate  the  constraints  on  Ax,  Ay,  and  A z  into  values  for  Ay,  Ag,  and  Ay,, 
we  take  differentials  of  the  relationships  in  (2.12), 

Ax  sin#cos</  f  cosO  coscj)  — /sin#sin</  A  y 

Ay  =  -  sin#  sin  (5  /cos# sin <b  /  sin # cos 6  Ag  .  (2.14) 

c 

_Azj  [_  cos#  -/sin#  0  J  [A^ 

Our  task  then  is  to  find  values  for  A y,  Ag,  and  A„,  so  that  (2.13)  is  satisfied  for  any 
choice  of  (/,  #,  <f>).  Clearly,  the  presence  of  /  as  a  scaling  factor  for  several  terms  in 
(2.14)  makes  it  impossible  to  do  this  for  all  choices  of  /  >  0.  In  order  to  proceed,  we 
must  incorporate  /  within  the  angular  increments.  Also,  in  the  interest  of  simplifying 
the  system  of  inequalities,  we  will  set  the  two  angular  increments  equal  to  each  other. 
Defining  A  =  fAg  =  f  A^,  we  can  rewrite  equations  (2.13)  and  (2.14)  as 


Ay  sin#cos(/  +  A  cos(#  +  <fi)  <777, 

ZJj 

(2.15) 

Ay  sin# sin +  A  sin(#  +  (f>)  <  7^7, 

(2.16) 

A  y  cos  #  -  A  sin  #  <  — . 

J  -  2  H 

(2.17) 

This  set  of  inequalities  is  a  compact  representation  of  the  Nyquist  criterion  for 
uniform  sampling  in  frequency-aspect  space.  To  solve  for  the  actual  increments,  we 
have  to  specify  the  desired  trade-off  between  A  y  and  A,  because  one  can  be  made 
larger  at  the  expense  of  the  other.  As  a  numeric  example,  consider  the  case  where 
A f  =  A  and  L  =  ma x{L,W,  H}.  In  this  case,  (2.15)  happens  to  impose  the  most 
stringent  requirement  on  the  sampling  increment.  Numeric  evaluation  yields 

1-618  Ay  <  (2.18) 

If  we  take  c  =  3  x  108  m/s,  /  =  100  MHz,  and  L  =  15  m  (for  a  fighter-sized  aircraft), 
we  have  Ay  <  6.18  MHz.  Using  the  largest  increment  possible  and  remembering  that 
we  chose  A  =  Ay  for  this  example,  the  frequency-aspect  increments  for  our  RCS 
database  would  be  Ay  =  6.18  MHz  and  A g  =  A<y  =  A/f  =  3.54°. 
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CHAPTER  3 


RECURSIVE  BAYESIAN 
FILTERING 


In  the  introduction,  we  stated  our  desire  to  seek  a  recursive  Bayesian  solution  to  our 
joint  tracking/classification  problem.  Before  presenting  this  solution,  we  must  first 
define  what  is  meant  by  “recursive  Bayesian  filtering.”  We  begin  our  discussion  in  the 
most  general  terms,  presenting  concepts  using  random  variables  and  density  functions, 
in  a  development  that  parallels  [28],  Once  the  recursive  Bayesian  filtering  problem 
is  understood  in  fundamental  terms,  we  then  expand  our  presentation  by  adopting  a 
specific  system  model. 

Many  problems  in  statistical  signal  processing  and  automatic  control  can  be  cast  in 
the  framework  of  estimating  the  evolving  “state”  of  a  system  given  measurements  that 
are  stochastically  related  to  it.  We  will  provide  a  more  precise  definition  of  the  system 
state  later.  For  now,  define  x/;  as  the  state  of  the  system  at  time  t  =  kAt ,  where  At 
is  the  sampling  interval.  Thus,  xi:*  =  {xi,  X2, . . . .  X/. }  is  a  sequence  of  state  values 
at  t  =  At,  2At, . . . ,  kAt.  Because  of  either  noise  in  the  state  evolution  process  or 
uncertainty  as  to  the  exact  nature  of  the  process  itself,  x/,  is  generally  regarded  as  a 
random  variable.  We  assume  that  information  concerning  this  unknown  state  sequence 
is  conveyed  through  a  measurement  process.  Define  z/.  as  the  measurement  produced 
by  the  system  at  t  =  kAt  and,  similar  to  our  state  notation,  let  zi:*  =  {zi,  zg, . . . ,  z /.} 
denote  a  sequence  of  measurements.  Here  again,  because  of  either  noise  in  the  mea¬ 
surement  process  or  uncertainty  in  the  underlying  state  sequence,  z^  is  also  regarded 
as  a  random  variable. 

With  these  definitions,  we  can  now  be  more  precise  about  our  state  estimation 
(or  filtering)  problem.  Namely,  for  the  purpose  of  monitoring  or  controlling  a  given 
stochastic  system,  we  wish  to  estimate  its  evolving  state  x*,  using  all  measurements  Zi 
collected  up  to  the  current  time.  Because  x/,.  is  a  random  variable,  all  information  pro¬ 
vided  by  Zi:fc  is  conveyed  by  the  posterior  density,  p(x;,  z  1 : . 1  A  Bayesian  filtering  al¬ 
gorithm  is  any  technique  that  produces  an  estimate  of  p(x^  |zi:*)  for  each  k  =  1,2,.... 

1  In  a  slight  abuse  of  notation,  density  functions  will  be  identified  by  their  arguments  whenever  this  does 
not  cause  confusion  (e.g.,  yjfx/, )  instead  of  the  cumbersome  pXfc|zl.fc(x*|zi;fc)).  Furthermore,  we 
will  use  the  same  notation  to  refer  to  both  random  variables  and  their  realizations,  such  as  the  use  of  zi;*  in 
P(*fc|zi:fc). 
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That  is,  instead  of  producing  an  estimate  of  x/,  directly,  a  Bayesian  filtering  algorithm 
supplies  an  estimate  of  the  posterior  p{xk \zi-.k )■  We  note  that  given  the  posterior  den¬ 
sity,  it  is  straightforward  (conceptually)  to  produce  any  desired  statistic  of  x*.  For 
instance,  a  minimum  mean-square  error  (MMSE)  estimate  of  the  current  state  could 
be  found  by  computing  the  conditional  mean:  x^  =  Jxfcp(xfc|zi:fc)dxfc.  A  recur¬ 
sive  Bayesian  filtering  algorithm  imposes  the  constraint  that  the  estimate  of  p(x^  |zi:* ) 
should  be  generated  solely  from  the  previous  posterior  density  p(x/,._  i  )  and  the 

most  recent  measurement  z*..  In  this  way,  the  problem  of  storing  z  | :  ;0 ,  the  entire  mea¬ 
surement  sequence,  is  avoided.  Thus,  we  can  summarize  with  the  following  definition. 

Definition  1  Recursive  Bayesian  Filtering 

Given  p(xj;_i  jzi:fc_i)  and  the  most  recent  measurement  z/,  calculate  (or  estimate)  the 
posterior  density  p(xk \zi-.k)- 

In  general,  it  is  not  possible  to  perform  Bayesian  filtering  recursively  (i.e.,  without 
storing  the  entire  measurement  sequence  zi:/. ).  To  see  why  this  is  so,  it  is  convenient 
to  think  of  a  two-step  process  that  takes  p(xj;_i  |zi:fc_i )  and  z*  as  inputs  and  returns 
the  desired  posterior  p(x^  |zi:^)  as  the  output.  The  first  step  of  this  process  is  com¬ 
monly  referred  to  as  prediction ,  and  it  maps  the  previous  posterior  p(xj;_i  Z|  )  into 
the  one-step  prediction  density  p{xk  |zi:fe_i).  The  second  step,  the  measurement  up¬ 
date,  combines  the  most  recent  observation  z*  and  p(x^ |zi;fc_i)  from  the  prediction 
step  to  produce  the  desired  posterior  p(x^  |  z  i :  & ) .  Formulas  for  these  two  steps  follow 
immediately  from  Bayes’  rule. 

Definition  2  Bayesian  Prediction 

P(x*|z1:*_i)  =  /  p(Xfc|Xfc_i,Zi:fc_i)p(xfc_i|zi:fc_i)dXfc_i  (3.1) 
1 

Definition  3  Bayesian  Measurement  Update 

p(z*|x*,Zi:fc_i)p(x*|zi:*_i) 

p(xfc \zi:k  )  = - 7 — 7 - r~ -  (3.2) 

p(zfc|zi:fc_i) 

Upon  examining  these  two  definitions,  we  see  immediately  why  Bayesian  filtering 
cannot  be  performed  recursively,  in  general.  Specifically,  the  presence  of  the  terms 
p(xfc|xfc_i;,  Zi;vfe_i )  in  (3.1)  and  p(z^|xfc,  zi;*_i)  in  (3.2)  require  that  we  store  all  pre¬ 
vious  measurements  zi-.k-i.  As  an  aside,  we  note  that  the  denominator  in  (3.2)  is  not 
troublesome  because  it  is  a  constant  and  can  be  found  (theoretically)  by  integrating  the 
numerator,  p(zfc|zi:fc_i)  =  /  p(z*|x*,zi:*_i)p(x*|zi:*_i)dx*.  Thus,  in  order  to  per¬ 
form  Bayesian  filtering  recursively,  we  must  impose  constraints  on  our  stochastic  sys¬ 
tem  that  allow  the  conditioning  on  Zi:fc_i  in  p(xj;  |xfc_i,  zi:*._i)  and  p(z^  |x*,  Zi:fc_i) 
to  be  dropped. 

The  first  assumption  we  make  is  that  the  state  sequence  is  Markov, 


p(xfc|x0:fc_l)  =p(x*|Xfc_i). 


(3.3) 
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Conceptually,  for  a  Markov  process,  the  value  of  the  random  variable  x _  L  provides  as 
much  information  about  X;,.  as  the  value  of  the  process  at  t  =  (k-l)At  and  all  previous 
time  instants.  In  this  sense,  the  typical  notion  of  “state”  is  actually  seen  to  follow  from 
the  definition  of  a  Markov  process;  namely,  the  state  of  a  system  x/.  is  any  collection 
of  variables  that  make  the  path  history  x(|:/._i  inconsequential  for  estimating  future 
values  of  the  sequence.  The  second  assumption  we  make  is  that  observations  occur 
through  a  memoryless  channel.  In  other  words,  conditioned  on  x^,  z/,  is  assumed  to  be 
independent  of  the  rest  of  the  state  sequence  and  all  other  measurements.  This  can  be 
expressed  as 

k 

P(Zl:*|xi:*)  =  JJp(Zj|Xi).  (3.4) 

i= 1 

With  these  two  assumptions,  we  are  able  to  prove  the  following  two  lemmas,  which  will 
allow  us  to  compute  p(x^  |zi;fc)  from  p(xfc_i  |zi:*_i)  without  the  past  measurement 
sequence 


Lemma  1  Recursive  Bayesian  Prediction 

Given  a  Markov  state  sequence  {x/. }  observed  through  a  memoryless  channel, 

P(x*|zi:*_i)  =  /  p(xfc|xfe_i)p(xfc_i|zi:fc_i)dXfc_i.  (3.5) 

•'Xfc-l 


Proof: 

By  the  Theorem  of  Total  Probability  and  Bayes’  rule, 


P(x*|zi:*_i)  =  /  p(Xfc,Xi:fc_i|zi:fc_i)dxi:fc_i 

•'Xlih-l 

P(Zl:*-l|xi:*)p(xi:*) 


/ 


-dXi:*_i 


(3.6) 


Next,  we  use  the  Markov  assumption  to  express  p(x i:*.)  as  p(x*|x*_i)p(xi:*_i).  In 
addition,  the  conditional  independence  of  the  observation  sequence  allows  us  to  drop 
the  conditioning  on  x/,  in  (3.6).  This  leaves  us  with 


P(x*|zi:*_i)  = 


>-/ 

"  X1  :fe  —  1 


P(zl:fc  —  l|xi:fc-l)p(xi:*_i)p(x*|Xfc_i) 


dx  (3.7) 


P(zi:fc-l) 

/  /  P(x*-1,  Xi:fc_2|zi:*_i)p(xfc|xfc_i)  dxi-k-2  dXk-l 

J-X.k-1  *^xl  :fc  — 2 

(3.8) 

/  p(x*|xfc_i)p(xfc_i|z1:*_i)dx*._i,  (3.9) 

7xfe_1 


where  we  used  Bayes’  rule  to  go  from  (3.7)  to  (3.8). 

In  Lemma  1,  we  showed  that  when  the  state  sequence  is  Markov  and  the  observa¬ 
tions  occur  through  a  memoryless  channel,  the  one-step  prediction  density  p(x/t  |zi:*_i ) 
can  be  computed  from  the  previous  posterior  p(x*_i  |zi;*_i )  without  storing  zi:*_i. 
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Now,  we  demonstrate  how  the  one-step  prediction  density  can  be  updated  upon  recep¬ 
tion  of  a  new  observation  z*. 


Lemma  2  Recursive  Bayesian  Update 

Given  a  measurement  sequence  {z/. }  that  is  conditionally  independent  given  the  state 
sequence  {x^}, 

p(zfc|xfc)p(xfc|zi:fc_i) 

p(x*|z1:*)  =  - - : - L- - .  (3.10) 

P(Zfc|Zl:fc-l) 

Proof: 

We  follow  the  proof  given  in  [28].  Using  Bayes  ’  rule, 


,  ,  x  p(z*,z1:fc_1|x*)p(x*i) 

RXfc  Z1:fc)  -  - 7 - 7 - 

P(Zl:fc) 

_  p(Zfc|Xfc)p(zi:fc_i|Xfc)p(Xfc) 
P(zi:fc) 


(3.11) 

(3.12) 


where  we  used  the  conditional  independence  of  given  x/.  to  go  from  (3.1 1)  to  (3.12). 
Applying  Bayes’  rule  top(zi:fc_i  |xfc)  yields 


,  |  X  p(z*  X*)p(x*  Zi:fc_i)p(zi:*_i) 

P(Xfc  Zi:fc)  =  - ! - 7 - 7 - 

_  p(zfclxfc)p(xfc|zi:fc_i) 

P{zk  |zl:fc  —  1 ) 


(3.13) 

(3.14) 


where  the  marginal  ofx/.,  present  in  both  the  numerator  and  denominator,  has  been 
canceled  in  (3.13). 


In  Lemma  2,  we  note  that  the  Markov  assumption  was  not  needed.  Rather,  the 
conditional  independence  of  the  measurement  process  was  enough  to  derive  p(xfc|zi;fc) 
from  p(x/.  z  ),  given  only  the  current  observation  z;,. .  In  summary,  while  Bayesian 
filtering  cannot  be  performed  recursively  in  general,  it  can  be  accomplished  for  stochas¬ 
tic  systems  with  Markov  state  sequences  and  conditionally  independent  measurement 
processes.  We  are  now  in  a  position  to  provide  the  motivation  for  our  specific  choice 
of  system  model. 

Consider  a  stochastic  system  model,  formulated  in  discrete  time. 


xfe  —  fk  (xfc  —  1 3  uk  )  j 


(3.15) 


where  X/.  is  an  nx  x  1  state  vector,  and  is  an  nu  x  1  process  noise  vector.  As  such, 
each  (potentially  time-varying)  function  J);  maps  x  to  Wlx .  The  sequence 
{u/,. }  can  be  used  to  model  either  a  random  perturbation  of  the  state  process  or  uncer¬ 
tainty  in  our  knowledge  of  (or  both).  This  is  a  powerful  formulation  that  allows  us  to 
approximate  a  differential  equation  of  arbitrary  order  as  a  first-order  vector  difference 
equation.  Next,  consider  a  stochastic  measurement  equation  of  the  form 


(3.16) 
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where  z/.  is  an  nz  x  1  observation  vector  corresponding  to  state  x/,,  and  w;.  is  an 
nw  x  1  measurement  noise  vector.  The  observation  function  which  is  also  allowed 
to  vary  with  time,  maps  x  to  SR"Z .  Similar  to  the  process  noise,  the  sequence 
{wfc  }  can  be  used  to  model  either  a  random  perturbation  of  the  observation  process  or 
uncertainty  in  our  knowledge  of  hk  (or  both).  Furthermore,  we  note  that  while  /;.  and 
h/.  are  assumed  to  be  known  for  all  /;,  no  assumptions  are  made  as  to  their  functional 
description  (e.g.,  whether  they  are  linear  or  nonlinear).  The  model  described  by  (3.15) 
and  (3.16)  is  often  referred  to  as  a  dynamic  system. 

We  wish  to  perform  Bayesian  filtering  on  the  dynamic  system  described  by  (3.15) 
and  (3.16).  In  order  to  avoid  the  growing  memory  problem  inherent  in  computing 
p(xfc|zi:fc)  for  all  k,  we  insist  upon  a  recursive  solution  to  the  filtering  problem.  The 
requirements  remain  the  same  as  before:  namely,  that  {x/. }  be  Markov  and  z  i :  be 
conditionally  independent  given  x  | :  .  However,  by  adopting  a  dynamic  system  model, 
these  two  assumptions  can  be  restated  solely  in  terms  of  the  noise  sequences,  {u;,. }  and 
{ w },  and  the  random  variable  xo  corresponding  to  the  initial  state.  We  provide  the 
result  in  the  following  proposition. 

Proposition  1  Recursive  Bayesian  Filtering  for  Dynamic  Systems 

The  dynamic  system  described  by  (3.15)  and  (3.16)  admits  a  recursive  solution  to  the 

Bayesian  filtering  problem,  as  described  in  Lemmas  1  and  2,  if  the  following  conditions 

hold. 

1.  The  noise  vectors  {u/. }  form  an  independent  sequence. 

2.  The  noise  vectors  {w/,. }  form  an  independent  sequence. 

3.  The  sequences  {u;,. }  and  {w* }  and  the  random  variable  xo  are  all  mutually 
independent. 

The  proof  follows  by  applying  properties  (3.3)  and  (3.4)  to  the  dynamic  system  model. 

In  this  section,  we  have  defined  the  recursive  Bayesian  filtering  problem  and  pro¬ 
vided  its  solution.  Lemmas  1  and  2  show  how  p(xj;  |zi;*)  can  be  calculated  from 
p(x;t_i  |zi:fc— i )  and  using  the  two-step  procedure  of  prediction  and  measurement 
update.  For  the  specific  case  of  a  dynamic  system  model.  Proposition  1  provides  a 
sufficient  set  of  criteria  for  the  existence  of  a  recursive  solution.  We  note  that  the  three 
criteria  are  satisfied  by  many  stochastic  models  of  interest,  including  the  one  we  will 
propose  for  joint  tracking/classification.  At  first  glance,  it  might  seem  that  this  settles 
the  matter.  Unfortunately,  the  solution  provided  by  Lemmas  1  and  2  is  conceptual; 
analytical  solutions  are  only  available  for  a  handful  of  systems.  More  specifically,  the 
integral  in  (3.5)  and  the  integration  needed  to  compute p{zk  |zi:fc— l)  in  (3.10)  generally 
do  not  have  closed-form  solutions.  Furthermore,  even  if  the  posterior  p(xfe|zi:fc)  can 
be  found,  estimates  such  as  xj  =  f  x*  p(x/,  (zi^jdx*  are  likely  to  be  intractable.  In 
the  next  section,  we  discuss  two  cases  where  a  closed-form  solution  to  the  prediction 
and  measurement  update  equations  can  be  found. 
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3.1  Exact  Recursive  Bayesian  Algorithms 


3.1.1  The  HMM  filter 


In  the  previous  section,  we  showed  that  if  xi:fc  was  Markov  and  Zi:k  conditionally  in¬ 
dependent  given  xi;fc,  a  conceptual  solution  to  the  recursive  Bayesian  filtering  problem 
could  be  found.  Unfortunately,  the  two-step  process  of  (3.5)  and  (3.10)  does  not,  in 
general,  admit  an  analytic  solution.  However,  if  xk  is  drawn  from  a  state  space  that  is 
both  discrete  and  finite,  the  integrals  in  (3.5)  and  (3.10)  become  (finite)  sums,  which 
are  easily  computed.  More  specifically,  the  prediction  equation  becomes 

N 

Pr(xfc  =  =  ^Pr(xfc  =  i\xk-i  =  j)Pr(xfc_  i  =  j|zi:fc_i)  (3.17) 

3— 1 


for  i  =  1,2,...,  iV,  where  N  is  the  dimension  of  the  state  space.  The  term  Pr(x;;  = 
i|xfe_i  =  j)  is  an  element  from  the  Markov  transition  matrix.  Likewise,  the  update 
equation  simplifies  to 


Pr(xfc  =  i\z1:k) 


p( Zfcjxfc  =i)  Pr(xfc  =  z|zi;fc_i) 
EjLlP(Zfc|Xfc  =i)Pr(xfc  =  j|Zl:fc-l)’ 


(3.18) 


where  the  normalization  term  in  the  denominator  has  been  expanded  to  indicate  how 
it  would  be  computed.  Thus,  the  recursion  in  Equations  (3.17)  and  (3.18)  requires 
only  the  ability  to  evaluate  the  transition  probabilities  Pr(x^  =  i|x*_i  =  j )  and 
the  likelihood  function  p(zk\xk  =  i).  Furthermore,  the  likelihood  only  needs  to  be 
evaluated  up  to  a  normalization  constant  because  of  its  presence  in  both  the  numerator 
and  denominator  of  (3.18). 

State  estimation  algorithms  based  upon  the  recursion  in  (3. 1 7)— (3. 1 8)  are  often 
referred  to  as  hidden  Markov  model  (HMM)  filters.  These  types  of  filters  are  widely 
used  in  speech  recognition  where  it  is  assumed  that  all  words  are  formed  from  the 
concatenation  of  basic  language  units  called  phonemes  [29].  An  example  of  a  phoneme 
is  the  T  vowel  sound  in  the  word  “hi.”  Each  language  has  its  own  distinct  set,  with 
approximately  40  for  American  English.  The  unknown  (or  hidden)  state  sequence  Xi:k 
is  the  set  of  phonemes  uttered  by  the  speaker.  The  observation  sequence  Zi:k  models 
both  the  human  vocal  mechanism  and  the  measurement  process  (e.g.,  noise).  Often, 
the  likelihood  p(zk\xk  =  i)  is  taken  to  be  a  Gaussian  mixture.  As  such,  the  HMM 
filter  applies  directly  to  the  speech  recognition  problem.2 

While  it  is  true  that  HMM  filters  have  revolutionized  the  field  of  speech  recogni¬ 
tion,  they  are  only  applicable  to  discrete  state  spaces  or  those  that  can  be  reasonably 
approximated  as  such.  For  our  application,  joint  tracking/classification  of  airborne  tar¬ 
gets,  this  assumption  is  not  valid.  In  the  next  section,  we  will  present  a  second  exact 
recursive  Bayesian  algorithm,  but  this  one  will  apply  to  continuous  state  spaces. 

2  Actually,  most  speech  recognizers  find  the  maximum  a  posterior  estimate  of  x  i  .jy  given  zi.-jy,  ,  where 
Nk  is  the  total  number  of  samples.  In  this  case,  a  common  approach  is  to  use  the  Viterbi  algorithm,  which 
replaces  (3.17)  by  the  computation  maxj  {Prfx/,  —  —  7  j z  l ;  _  l ) }  for  each  i  [30]. 
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3.1.2  The  Kalman  filter 


In  this  section,  we  maintain  the  assumptions  necessary  for  recursive  Bayesian  filtering. 
Thus,  Xi:fc  is  Markov,  and  Z[:k  is  conditionally  independent  given  x | : /.  (or,  for  a  dy¬ 
namic  system,  {u^  },  {w;0 }.  and  xo  are  both  individually  and  mutually  independent  for 
all  k).  However,  instead  of  assuming  that  xk  is  drawn  from  a  discrete  state  space,  we 
instead  assume  that  the  system  model  takes  the  form 


xfc  =  Ffcxfc— i  +  ufc,  (3.19) 

zk  =  Hkxk  +  wfc,  (3.20) 

where  Fk  and  Hk  are  appropriately  dimensioned  matrices.  We  recognize  immediately 
that  (3.19)-(3.20)  represents  a  dynamic  system  model  where  the  functions  and  hk, 
though  still  permitted  to  vary  with  time,  are  now  linear.  With  this  stochastic  model,  we 
make  the  following  proposition. 

Proposition  2  Exact  Recursive  Bayesian  Filtering  for  Linear,  Gaussian  Systems 
Given  a  dynamic  system  that  satisfies  (3.19)-(3.20),  where  {x;,. }  is  Markov  and  {z/. } 
is  conditionally  independent  given  {x/0 }.  If  the  initial  state  xo  and  the  noise  processes 
{u/,. }  and  {w;, }  are  each  Gaussian  for  all  k,  then  the  Kalman  filter  provides  the  exact 
recursive  Bayesian  solution  forp(xk  |zi:*). 

To  see  why  this  might  be  so,  it  is  helpful  to  consider  the  actual  Kalman  filtering 
equations.  We  will  denote  a  Gaussian  distribution  with  mean  p  and  covariance  matrix 
E  as  Af(p,  S).  Similarly,  evaluation  of  the  same  Gaussian  density  function  at  x  will 
be  denoted  as  J\f(x;p,  E).  Then,  as  specified  in  Proposition  2,  we  assume  ~ 
JV( 0,  Qk),  wk  ~  Af( 0,  Rk),  and  x0  ~  E0),  where  the  statistics  Qk,  Rk,  p0, 

and  Eq  are  assumed  to  be  known  for  all  k.3  Then,  with  the  notation 


Xfcjfc  =  E[xfc|z1:fc],  (3.21) 

^k\k  -  E[(xfc  -  xfc]fe)(xfc  -  xfcjfc)'|zi:fc],  (3.22) 

we  can  now  state  the  Kalman  filter  equations  [31, 32]: 

^■k\k— 1  — Fk~X-k— l|fc  —  1 5  (3.23) 

^k\k-i  =  Fk^k-i\k-iFk  +  Qk:  (3.24) 

k  —  *k\k-l  +  Kk(zk  —  i^A;Xfc|fe_i),  (3.25) 

^ k\k  —  1  |fe— 1?  (3.26) 


3The  noise  processes  {u^}  and  {w^}  do  not  need  to  be  zero-mean  for  the  Kalman  filter  to  be  Bayesian 
optimal.  However,  because  their  mean  values  must  be  known  for  all  k,  it  would  be  straightforward  to  recast 
the  system  in  terms  of  zero-mean  pseudo-noises  and  known  control  inputs. 


20 


where 


Kk  =  ^k\k-lH'kSk  (3-27) 

Sk=Hk±k\k_1H'k+Rk.  (3.28) 

is  the  Kalman  gain,  and  Sk  is  the  covariance  of  the  innovation  term,  z/.  —  J7/,X/.  /._  i . 

At  first  glance,  it  might  not  seem  that  (3.23)-(3.26)  have  much  to  do  with  the  re¬ 
cursive  Bayesian  filtering  solution  presented  in  Lemmas  1  and  2.  However,  the  precise 
structure  of  the  stochastic  model  (linear  and  Gaussian)  permits  a  significant  simplifi¬ 
cation  of  the  prediction/update  procedure.  Because  xo  and  Ui  are  jointly  Gaussian,  xi 
will  be  Gaussian.  Then,  because  Wi  is  Gaussian,  zi  |xi  will  be  Gaussian.  General¬ 
izing  this  line  of  reasoning,  because  Gaussian  distributions  are  preserved  under  linear 
transformations,  it  can  be  shown  that  both  the  one-step  prediction  density  p(x^  ) 

and  the  posterior  p(xfc  |zi:*)  are  Gaussian  for  all  k.  Because  a  Gaussian  distribution  is 
completely  specified  by  its  first  two  moments,  it  suffices  to  propagate  the  conditional 
mean  E[x*|zi:*]  and  the  conditional  covariance  matrix  var[xfc|zi:^].  Inspection  of 
(3.23)-(3.26)  reveals  that  the  Kalman  filtering  algorithm  accomplishes  precisely  this. 
Equations  (3.23)-(3.24)  are  the  analytic  solution  for  the  Bayesian  prediction  step,  while 
Equations  (3.25)-(3.26)  are  the  analytic  solution  for  the  Bayesian  measurement  update. 
Thus,  we  see  that  the  well-known  Kalman  filter  provides  the  exact  recursive  Bayesian 
solution  for  linear  Gaussian  systems. 

3.2  Suboptimal  Recursive  Bayesian  Algorithms 

In  the  previous  section,  we  discussed  two  filtering  problems  for  which  exact  recursive 
Bayesian  solutions  exist.  Unfortunately,  for  system  models  that  are  neither  discrete  nor 
linear-Gaussian,  it  is  typically  impossible  to  implement  the  recursive  Bayesian  solution 
provided  by  Lemmas  1  and  2.  Instead,  approximations  to  the  optimal  filter  are  often 
used.  Because  the  state  space  for  tracking  problems  is  inherently  continuous  (or  mixed, 
in  the  case  of  joint  tracking/classification),  most  suboptimal  algorithms  in  the  tracking 
literature  are  based  on  the  Kalman  filter.  Broadly  speaking,  these  algorithms  can  be 
divided  into  two  categories:  (1)  those  that  approximate  the  posterior  as  Gaussian,  and 
(2)  those  that  approximate  the  posterior  as  a  sum  of  basis  functions.  In  this  section,  we 
will  present  examples  of  both  approaches  to  suboptimal  recursive  Bayesian  filtering.  In 
the  end,  though,  we  will  find  that  none  are  suitable  for  FM-band  passive  radar  tracking 
and  classification. 

3.2.1  Single  Gaussian  approximations 

The  majority  of  suboptimal  Bayesian  filters  fall  in  the  category  that  we  term  single 
Gaussian  approximations.  All  of  the  filters  in  this  class  approximate  the  true  posterior 
as  a  Gaussian  distribution.  We  begin  with  the  most  popular  of  these,  the  extended 
Kalman  filter. 
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Extended  Kalman  filter 


The  extended  Kalman  filter  (EKF)  applies  the  Kalman  filtering  recursion  to  the  case 
of  nonlinear  state  and  measurement  equations  [33].  The  dynamic  system  most  often 
considered  is 


Xfc  =  /fc(xfc_i)  +  ufc,  (3.29) 

zfc  =  hk(xk)  +  wfc,  (3.30) 

where  the  noise  processes  are  assumed  to  be  Gaussian,  though  fk  and  hk  can  now  be 
nonlinear.  The  extended  Kalman  filter  is  derived  by  linearizing  fk  and  hk  using  the 
first-order  Taylor  series  approximations 


A(xfc-i)  «  /*(x*-i|fc_i)  +  F*(x*_i  -  x*_ijfc_i),  (3.31) 

Mx*)  «  Mx*|*-i)  +  Hk(xk  -  xfc]fe_i),  (3.32) 


where  and  are  the  Jacobian  matrices, 

(3.33) 

(3.34) 

Using  the  linear  approximations  in  (3.31)  and  (3.32)  in  place  of  fk  and  hk  in  the  stan¬ 
dard  Kalman  filter  derivation  yields  the  following  recursive  algorithm: 


Fk  = 


<9/fc(x) 


Hk  = 


dx 

A  dhk(x) 


dx 


x— xfe-l|fe-l 


X=Xfe|fe_i 


^■k\k— 1  fkip^-k— l\k— 1)5 

(3.35) 

^k\k- 1  =  -FfcSfc-ljfc-l-Ffc  +  Qk, 

(3.36) 

=:  1  “I”  Kk  (zfc  hk  1 ))  5 

(3.37) 

£ k\k  =  ^k\k-i  ~  KkHkHk\k-i- 

(3.38) 

The  definitions  for  Kk  and  Sk  are  identical  to  (3.27)  and  (3.28)  except  Hk  is  replaced 
by  the  Jacobian  Hk.  The  EKF  algorithm  approximates  the  posterior  density  p(xk  |zi:* ) 
as  Gaussian  with  mean  and  covariance  equal  to  xk\k  from  (3.37)  and  T>k\k  from  (3.38), 
respectively. 

If  the  approximations  provided  by  the  first-order  Taylor  series  expansion  in  (3.31) 
and  (3.32)  are  not  good  enough,  additional  terms  can  be  retained.  This  leads  to  higher 
order  variants  of  the  EKF  algorithm.  For  example,  the  truncated  second-order  filter 
starts  with  a  quadratic  approximation  for  fk  and  hk  and  then  retains  all  second-order 
moments  during  the  ensuing  derivation.  The  Gaussian  second-order  filter  also  begins 
with  a  quadratic  approximation  of  the  nonlinearities.  However,  whereas  the  truncated 
second-order  filter  ignores  all  central  moments  of  xk  above  second  order,  the  Gaussian 
second-order  filter  accounts  for  the  fourth  central  moments  by  approximating  them 
using  the  values  they  would  assume  if  the  posterior  actually  was  Gaussian  [33-35].  In 
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either  case,  the  additional  complexity  of  these  higher-order  filters  has  prevented  their 
widespread  use  [36]. 

Interacting  multiple  model  algorithm 

In  many  tracking  applications,  targets  maneuver  in  different  fashions  at  different  times. 
For  example,  an  aircraft  might  fly  with  a  constant  velocity  and  heading  initially  before 
executing  a  tight  turn.  In  cases  such  as  these,  it  may  be  difficult  to  specify  a  single 
dynamic  model  that  accurately  represents  the  target’s  behavior  at  all  times.  One  po¬ 
tential  solution  is  to  vary  some  of  the  parameters  of  the  Kalman  filter  in  an  attempt  to 
more  accurately  match  the  evolving  target  dynamics.  For  example,  in  [37],  the  authors 
adopted  the  system  model  =  Fkx.k_i  +  Gk(uk  +  Afc),  where  {\k}  was  a  se¬ 
quence  of  unknown  pilot  command  vectors  drawn  from  a  finite  set,  {a<V.,aW}. 
The  authors  then  formulated  {zk }  as  observations  from  a  hidden  Markov  model  with 
{A  k}  as  the  hidden  sequence.  Under  simplifying  assumptions,  their  algorithm  reduces 
to  a  single  Kalman  filter  with  Xk  =  =  |zt  ■  as  supplemen¬ 

tary  input.  In  [38],  the  proposed  filter  switched  back  and  forth  between  a  constant- 
velocity  model  and  a  constant-acceleration  model  based  upon  a  fading  memory  aver¬ 
age  of  (zfc  —  iTj;Xj.n._1),5^~1(zfc  —  UfcXfc |fc_r).  This  algorithm  attempts  to  estimate 
the  unknown  acceleration  during  maneuvers  as  opposed  to  confining  it  (through  the 
use  of  fixed  command  inputs)  to  a  predefined  set.  In  either  case,  the  restriction  that  a 
single  state  model  (Fk  or  fk)  must  be  used  for  an  entire  sample  period  is  a  fundamental 
limitation  of  both  approaches. 

Alternatively,  the  motion  of  a  target  that  switches  between  different  maneuver 
modes  can  be  modeled  as  a  Jump  Markov  System  (JMS),4 

xfc  =  f(hk)  (xfc-i )  +  i47fc) ,  (3.39) 

zk  =  h^h\-xk)  +w  fc,  (3.40) 

where  {77,. }  is  a  finite-state  Markov  chain  taking  values  in  {1,  2, ... ,  N}  according 
to  the  transition  matrix  P/mm-  The  components  of  P/mm  are  defined  as  P/mm(*:  j)  = 
Pr(7 h  =  j|7fc-i  =  i)  for  any  k.  As  seen  from  the  superscripts  in  (3.39)  and  (3.40),  a 
jump  Markov  system  can  have  as  many  as  N  distinct  state  models,  each  with  its  own 
measurement  function  and  process  noise  covariance.  While  this  addresses  the  prob¬ 
lem  of  modeling  the  motion  of  a  target  that  maneuvers  in  different  fashions  at  differ¬ 
ent  times,  optimal  filtering  algorithms  for  jump  Markov  systems  have  computational 
complexity  that  increases  exponentially  with  k.  For  example,  if  the  system  was  lin¬ 
ear  Gaussian  when  conditioned  upon  {77,.},  a  separate  Kalman  filter  would  be  needed 
for  every  possible  model  sequence.  Clearly,  this  can  be  impractical  for  even  a  mod¬ 
est  number  of  samples.  The  Interacting  Multiple  Model  (IMM)  algorithm  addresses 
this  computational  difficulty  by  merging  the  state  estimates  produced  by  each  model  at 
the  beginning  of  each  sample  interval  [39,40],  Because  this  system  will  be  used  as  a 

4In  general,  the  statistics  of  the  measurement  noise  wj  are  also  allowed  to  vary  with  7^,  but  this  is 
unnecessary  for  our  formulation. 
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Figure  3.1:  Example  of  IMM  algorithm  with  N  =  2. 


performance  benchmark  in  Chapter  7,  we  describe  its  implementation  now. 

The  IMM  algorithm  uses  N  separate  extended  Kalman  filters  (or  standard  Kalman 
filters,  if  / ^  and  h!'k'1  are  linear).  We  will  use  superscripts  to  denote  the  state  es¬ 
timates  and  covariances  for  each  filter.  Figure  3.1  provides  a  block  diagram  for  the 
case  where  N  =  2.  Looking  at  the  figure,  we  see  that  each  iteration  of  the  IMM  al¬ 
gorithm  proceeds  in  four  steps.  First,  the  previous  outputs  from  each  filter  “interact” 
through  the  previous  model  probabilities  to  produce  mixed  inputs  for 

the  current  scan.  This  is  the  key  step  that  keeps  computation  from  growing  exponen¬ 
tially.  Next,  each  EKF  operates  as  normal,  producing  the  updated  outputs  .  The 
likelihoods  from  each  filter,  denoted  are  then  used  to  generate  the  updated  model 
probabilities  /ik  ^ .  Finally,  the  updated  probabilities  are  used  to  average  the  outputs  of 
the  individual  filters.  The  details  of  each  step  are  presented  next. 

1 .  Interaction  (or  Input  Mixing) 

Denote  the  a  posteriori  probability  that  the  target  was  moving  according  to 
model  7  at  scan  k  —  1  as  p  { .  Then,  we  have 


/4_i  -  Pr(7fc-i  =  Al k  =  l)  = 


PimmU,  7)/4_i 


where 


ck~i  = 

i=  1 


(3.42) 


and  7  =  1, . . . ,  N.  Note  that  pertains  to  the  “reverse”  of  a  typical  transi¬ 
tion.  With  these  probabilities,  the  previous  state  estimates  and  covariance  matri- 
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ces  are  mixed  according  to 


1.(7)* 

l|fc-l 


y(7)» 


N 


7) -(*) 

Pk-l  Kk  -Ik  1  ’ 


i= 1 
JV 


/v(')  +x(i)  (x(i) 

v  «— 1|*— 1  ^  fc— 1|*— 1  vxfc-i|fc-ii  ) 


t=i 


>(7)*  /  -  (7)*  v 

—  llJfc— 1  V-^fc  — life— 1 )  • 


'■*— 1|* 


(3.43) 


(3.44) 


Each  mixed  input  is  then  provided  to  its  respective  extended  Kalman  filter. 

2.  Kalman  Prediction  and  Update 

The  Kalman  prediction  and  update  proceed  as  specified  in  the  regular  EKF  (see 
Equations  (3.35)— (3.38)).  The  likelihood 

(3.45) 

is  also  stored  for  use  in  the  next  step.  Note  that  (3.45)  assumes  that  the  measure¬ 
ment  noise  is  Gaussian. 

3.  Probability  Update 

Using  the  likelihoods  from  the  previous  step,  the  model  probabilities  are  updated 
according  to 


N 


(7) 

/4  = 


_  A  (7)^(7) 

C  k  Gfc-15 


where 


c  =  £aM-i- 

1=1 


(3.46) 


4.  Output  Mixing 

The  state  estimate  and  covariance  matrix  for  the  IMM  algorithm  is  obtained  as  a 
weighted  sum  of  the  outputs  from  the  individual  filters.  With  the  updated  model 
probabilities  as  the  weights,  we  have 

*k\k  =J2vik)Xk\l  (3.47) 

7— 1 

=  £/47)  (s*|*  +x*|*(x*|*)')  -x*|*(x*|*)'-  (3-48) 

7=1 

Note  that  the  IMM  algorithm  requires  N  times  the  computation  of  an  extended 
Kalman  filter.  Nonetheless,  its  excellent  performance  against  maneuvering  targets  has 
made  it  very  popular  in  the  radar  community.  As  such,  it  will  serve  as  a  benchmark 
when  we  present  our  experimental  results. 


Unscented  Kalman  filter 

The  unscented  transform  is  motivated  by  the  intuition  that  it  may  be  easier  to  approx¬ 
imate  a  probability  distribution  than  an  arbitrary  nonlinear  function.  Thus,  instead  of 
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linearizing  j).  or  hk.  the  unscented  transform  deterministically  generates  a  set  of  points 
whose  sample  mean  and  sample  covariance  are  equal  to  Xfc_1|ys;_1  and  E/0_  |  k_  | ,  re¬ 
spectively.  The  nonlinear  function  is  then  applied  to  each  of  the  sample  points,  yield¬ 
ing  a  transformed  sample  from  which  the  predicted  mean  and  covariance  are  calcu¬ 
lated  [41,42],  The  unscented  Kalman  filter  is  so  named  because  it  implements  the 
Kalman  recursion  using  the  sample  points  provided  by  the  unscented  transform. 

The  unscented  Kalman  filter  has  two  advantages  over  the  EKF.  First,  the  estimate 
of  the  conditional  mean  provided  by  the  unscented  KF  can  be  shown  to  be  correct 
up  to  the  second  order  (of  its  Taylor  series  expansion).  Because  the  estimate  of  the 
conditional  mean  provided  by  the  EKF  is  only  correct  to  the  first  order,  the  unscented 
KF  can  sometimes  be  more  accurate  than  the  EKF  [41].  Second,  the  unscented  KF 
does  not  use  any  Jacobians  ( Fk  or  //;.).  This  is  beneficial  because  these  matrices  may 
be  cumbersome  to  derive. 

Shortcomings  of  single  Gaussian  approximations 

All  of  the  single  Gaussian  approximations  that  we  have  presented  suffer  from  the  same 
shortcoming.  By  only  maintaining  estimates  of  the  conditional  mean  and  covariance, 
they  implicitly  approximate  the  posterior  p(xk\zi:k)  as  Gaussian.  If  the  true  density 
is  either  heavy-tailed  or  multimodal,  a  Gaussian  description  will  be  inaccurate.  As  an 
example,  consider  the  effect  of  target  glint  on  radar  measurements.  If  the  target  is 
modeled  as  a  collection  of  scattering  centers,  target  glint  is  the  error  in  state  estimation 
due  to  interference  between  these  phase  centers  as  the  target’s  aspect  with  respect  to 
the  radar  varies.  In  [12],  it  was  noted  that  glint  results  in  heavy-tailed,  non-Gaussian 
measurement  errors,  which  severely  degrade  the  performance  of  the  Kalman  filter.  A 
second  example  where  a  Gaussian  approximation  to  the  posterior  is  inappropriate  is 
the  case  of  tracking  in  the  presence  of  clutter.  In  [13],  the  goal  was  to  track  outlines 
of  foreground  objects  in  video  scenes  that  contained  substantial  background  clutter. 
One  specific  experiment  involved  the  task  of  tracking  a  person  walking  in  front  of 
other  people.  In  the  presence  of  such  clutter,  the  true  posterior  density  is  multimodal. 
However,  because  of  its  Gaussian  approximation,  the  Kalman  filter  is  unable  to  ac¬ 
commodate  simultaneous  alternative  hypotheses  until  they  can  be  disambiguated  by 
future  measurements.  Instead,  the  authors  reported  that  the  Kalman  filter  tended  to  get 
“distracted”  by  background  clutter,  never  to  recover. 

Because  our  FM-band  passive  radar  must  track  multiple  targets  in  the  presence  of 
clutter  (i.e.,  false  alarms),  the  true  posterior  will  generally  be  multimodal.  As  such, 
the  single  Gaussian  approximations  are  poor  matches  for  our  application.  There  is  an 
additional  difficulty  with  the  EKF  and  the  IMM  algorithm.  Because  delay  and  Doppler 
shift  are  nonlinear  functions  of  position  and  velocity,  the  measurement  model  h  /.  for 
our  application  will  be  nonlinear  as  well.  As  discussed  earlier,  filtering  within  the 
Kalman  framework  then  requires  knowledge  of  the  Jacobian  matrix  Hk.  The  Jacobian 
is  required  in  order  to  map  uncertainty  in  the  measurement  space  into  uncertainty  in 
the  state  space  during  the  Kalman  update.  Thus,  if  we  wish  to  include  RCS  as  a  fea- 
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ture  in  our  EKF  data  vector,  we  need  to  compute  partial  derivatives  such  as  dak /dpk, 
where  p/.  is  the  position  vector  extracted  from  state  X/. .  Because  there  is  no  closed- 
form  expression  for  RCS,  it  will  generally  be  impossible  to  find  a  closed-form  expres¬ 
sion  for  any  partial  derivative  involving  RCS  (see  Equations  (2.1)  and  (2.5)).  Instead, 
these  derivatives  would  need  to  be  approximated  numerically,  but  the  computational 
demands  of  such  an  approach  would  be  impractical  for  real-time  applications  such  as 
ours.  Therefore,  the  need  to  compute  Jacobians  prevents  the  use  of  RCS  within  the 
Kalman  framework.  However,  because  the  majority  of  class  information  is  conveyed 
through  RCS  in  VHF-band  radar,  this  limitation  immediately  excludes  both  the  EKF 
and  the  IMM  algorithm  as  potential  filtering  solutions  for  our  joint  tracker/classifier. 

3.2.2  Sum  of  basis  functions 

In  the  previous  section,  we  considered  several  filtering  algorithms  that  approximated 
the  posterior  density  using  a  single  Gaussian  distribution.  For  applications  such  as 
multitarget  tracking  or  tracking  in  the  presence  of  clutter,  the  true  posterior  is  often 
multimodal.  In  these  cases,  the  Gaussian  assumption  reduces  the  amount  of  informa¬ 
tion  available  significantly.  In  fact,  for  multimodal  systems,  the  EKF  operates  more  as 
a  maximum  likelihood  estimator  than  a  minimum  variance  estimator,  and  the  resulting 
estimate  of  the  conditional  mean  may  actually  just  follow  one  of  the  peaks  of  the  true 
posterior.  As  an  alternative  to  the  single  Gaussian  assumption,  a  sum  of  basis  functions 
can  be  used  to  approximate  the  true  posterior.  Although  many  different  bases  have 
been  proposed  in  the  literature,  one  particular  example  is  noteworthy. 

The  Gaussian  sum  filter  uses  a  weighted  sum  of  Gaussian  functions  to  approximate 
a  posterior  distribution  of  arbitrary  complexity  [43,44], 

N 

XMi »p(x*lzt:*).  (3.49) 

i—1 

where  /x^  and  are  the  mean  and  covariance,  respectively,  of  the  ith  Gaussian 
function  given  zi:*,  and  {w^k}{Li  is  a  set  of  nonnegative  weights  that  sum  to  one.  The 
use  of  Gaussian  functions  has  two  distinct  advantages  over  other  choices  for  the  basis. 
First,  the  Gaussian  functions  lend  a  certain  mathematical  tractability  to  the  associated 
filtering  algorithms.  Second,  the  Gaussian  sum  is  a  valid  density  function  for  any  N. 
Starting  with  a  Gaussian  sum  approximation  for  p{x.k\^i-.k)  and  a  state  model  of  the 
form  Xfc  =  fk  (x/._  | )  +  Ufc,  the  one-step  prediction  density  is  approximated  as 
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p(Xfc+i|zi:fc)  =  J  p(xk+i\xk)  p(xk\zUk)  dxk 


j P^+l  (x*+i  -  /*+ i(x*))  •  J2Wk\k^^^k\k^k\k)  dxk 

i= 1 

N  ■  r 

XM|i  J  N(xk+1]  fk+1(xk),Qk+1)  N(xk]  dx^ 


i=l 


JV 


(3.50) 

(3.51) 


i-  1 


In  general,  the  N  integrals  in  (3.50)  are  intractable.  In  order  to  arrive  at  (3.51),  fk+i 
is  replaced  by  a  first-order  Taylor  series  expansion  about  /x^fc.  The  integrand  is  then  a 
quadratic  function  of  x*,  allowing  the  integration  to  be  performed  analytically. 

The  measurement  update  of  the  Gaussian  sum  filter  proceeds  in  a  similar  way. 
Assuming  the  form  zk  =  hk(xk)  +  wk,  we  have 


P(Xfc+l|zi:fc+i) 


p(zk+1\xk+1)p(xk+1\z1:k) 

Cfc+i 

Pv/k+l  (zfc+l  —  ^fc+l(xfc+l)) 
Ck+i 


N 


^2Wk+l\k^  (xH-t;  Mfc+ll* 5  ^k+l\k) 


AO 


l=i 


—  ^  y  ]  Wk4--\  \kN  (Zfc  +  1  i  ^fc+l  (Xfe+1  );  ^fc+1  ) 
Cfc+r  ^ 

N 

K  Wk+l\k+l^  (Xfc+1’ ^'i+llfe+l’ ^fc+l!fc+l)’ 


i—1 


(3.52) 

(3.53) 


where  Ck+ 1  is  a  normalization  constant.  To  arrive  at  (3.53),  the  measurement  function 
hk+ 1  in  (3.52)  is  replaced  by  a  first-order  Taylor  series  approximation  about  /xj^  t  . 
Each  product  in  the  sum  in  (3.52)  is  then  quadratic  in  x^+i,  and  Equation  (3.53)  fol¬ 
lows. 

Thus,  we  see  that  the  Gaussian  sum  filter  is  indeed  able  to  maintain  a  multimodal 
approximation  to  the  true  posterior.  As  such,  it  has  the  potential  to  outperform  the 
EKF  when  used  for  tracking  in  the  presence  of  clutter  or  multiple  targets.  However, 
our  development  has  highlighted  a  crucial  shortcoming  that  it  does  share  with  the  EKF, 
namely  the  need  to  compute  the  Jacobian  matrix  for  hk.  Recall,  in  going  from  (3.52)  to 
(3.53),  hk+ 1  was  replaced  with  a  first-order  Taylor  series  approximation.  As  discussed 
in  the  previous  section,  in  general,  there  are  no  closed-form  relationships  for  partial 
derivatives  involving  RCS.  Thus,  incorporation  of  RCS  into  the  measurement  vector 
prevents  use  of  the  Gaussian  sum  filter,  just  as  it  ruled  out  the  EKF  and  its  variants. 

At  this  point,  it  would  seem  that  the  demanding  nature  of  our  RCS-based  joint 
tracking/classification  application  has  exhausted  our  Bayesian  filtering  options.  Fortu¬ 
nately,  this  is  not  the  case.  There  are  a  whole  class  of  recursive  Bayesian  algorithms 
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that  we  have  not  considered  yet,  but  they  require  a  decidedly  “non-Kalman”  approach. 
These  filters  are  based  upon  Monte  Carlo  (or  random  sampling)  techniques,  and  their 
properties  are  the  subject  of  the  next  chapter. 
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CHAPTER  4 


MONTE  CARLO-BASED 
APPROACHES  TO  BAYESIAN 
FILTERING 


In  the  last  chapter,  we  presented  the  Bayesian  approach  to  filtering.  In  order  to  avoid 
having  to  store  the  past  measurement  sequence  we  insisted  upon  a  recursive 

solution.  We  then  showed  that  such  a  solution  existed  if  was  Markov  and  z  | ;  was 
conditionally  independent  given  x  i : .  Unfortunately,  the  two-step  recursive  Bayesian 
solution  provided  by  Lemmas  1  and  2,  while  conceptually  appealing,  proved  to  be  in¬ 
tractable  in  all  but  two  cases:  (1)  discrete,  finite  state  spaces  or  (2)  linear,  Gaussian 
system  models.  Because  our  state  space  is  continuous  and  our  measurement  model  is 
nonlinear,  we  then  considered  several  suboptimal  extensions  of  the  Kalman  filtering 
algorithm.  We  found  that  they  all  suffered  from  at  least  one  of  two  severe  drawbacks: 
they  either  assumed  a  Gaussian  distribution  for p{xk  \zi-.k),  °r  they  assumed  knowledge 
of  the  Jacobian  dhk/ditk-  The  first  assumption  is  inappropriate  for  tracking  applica¬ 
tions  involving  clutter  or  multiple  targets  because  p(xfc|zi;fc)  can  then  be  multimodal. 
The  second  assumption  fails  because  we  wish  to  include  radar  cross  section  in  z/.  for 
the  purpose  of  performing  target  classification.  Because  there  is  no  closed-form  rela¬ 
tionship  between  the  components  of  x/,  (such  as  position  and  orientation)  and  RCS,  it 
is  impractical  to  compute  (or  estimate)  dhj;  / Ox/,. . 

It  might  seem  as  though  our  joint  tracking/classification  problem  does  not  admit  a 
practical  recursive  Bayesian  solution.  As  it  turns  out,  what  is  really  at  fault  is  the  notion 
of  extending  the  Kalman  filtering  algorithm  to  an  application  for  which  it  was  never  in¬ 
tended.  Instead,  we  need  a  radically  different  approach  for  joint  tracking/classification. 
In  this  chapter,  we  will  explore  Monte  Carlo  (or  sampling)-based  approaches  to  recur¬ 
sive  Bayesian  filtering.  We  will  find  that  their  flexibility  allows  them  to  succeed  where 
the  Kalman  filter  and  its  variants  failed. 
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4.1  Sampling  from  an  Arbitrary  Distribution 


In  this  section,  we  provide  the  necessary  theoretical  background  for  the  particle  filtering 
algorithm  introduced  later  in  this  chapter.  Our  presentation  draws  on  overviews  given 
in  [45,46].  We  begin  our  discussion  of  Monte  Carlo  (MC)-based  Bayesian  filtering 
techniques  by  considering  a  slightly  different  problem,  namely  the  computation  of  the 
high-dimensional  integral 

1(g)  =  J  9(x)p(x  lz)dx,  (4.1) 

where  <?(•)  is  any  p(x|z)-integrable  function.  Because  of  integrals  similar  to  (4.1),  the 
recursive  Bayesian  filtering  solution  provided  by  Lemmas  1  and  2  was  deemed  imprac¬ 
tical.  More  specifically,  because  the  integration  in  (4.1)  occurs  over  an  nx-dimensional 
space  and  p(x|z)  may  well  be  multivariate  or  nonstandard,  (4.1)  is  typically  intractable. 
Instead  of  seeking  an  analytic  solution,  an  approximation,  which  we  denote  ijv  (<?),  can 
be  obtained  by  Monte  Carlo  integration. 


jnp  (g)-^~Y^  5(x(,) )  5.  (4-2) 

P  •  , 

1  1=1 

where  {x^  :  i  =  1, ,  Np}  are  drawn  independently  from  p(xjz).  The  validity  of 
this  approximation  is  guaranteed  by  the  strong  law  of  large  numbers  (SLLN),  which 
states  that  the  average  of  many  independent  random  variables  with  common  mean  and 
finite  variance  converges  to  their  common  mean: 

lim  In  (g)  =  1(g),  with  probability  one.  (4.3) 

JVy— yoo  p 

Furthermore,  if  the  variance  a2  of  g(x)  with  respect  to  p(x|z)  is  finite,  a  central  limit 
theorem  also  holds: 

(!np  ( g )  -  I ( g ))  — *  m  o-g),  in  distribution.  (4.4) 

Equation  (4.4)  is  quite  useful  because  it  indicates  how  the  error  in  our  Monte  Carlo 
approximation  varies  with  Np.  The  advantage  of  Monte  Carlo  integration  is  clear. 
Whereas  a  Riemann  approximation,  which  samples  the  state  space  in  a  deterministic 
manner,  has  an  accuracy  of  0(NP  1  ’  )  for  an  nx-dimensional  integral,  the  Monte 

Carlo  approximation,  which  samples  the  state  space  randomly,  has  an  accuracy  of 
0(NP  1  //'i),  independent  of  nx  [46],  It  is  in  this  sense  that  Monte  Carlo  integration 
is  said  to  “beat  the  curse  of  dimensionality”  [47].  Unfortunately,  Np  is  only  half  the 
story  in  (4.4);  a 2  may  grow  appreciably  as  the  dimension  of  the  state  space  increases. 
Thus,  although  Monte  Carlo  integration  is  theoretically  preferable  in  high-dimensional 
spaces  to  deterministic  techniques,  both  types  of  approximation  may  be  inaccurate,  de¬ 
pending  on  how  a2  scales  with  nx .  Furthermore,  as  a  random  sampling  technique,  the 
approximation  lNv(g )  requires  the  ability  to  draw  independent  samples  {x^}  from 
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Figure  4.1:  A  simple  example  of  rejection  sampling.  The  sample  xS1'1  is  drawn  from 
a  uniform  proposal  distribution,  tt(x\z)  =  (tc) -  The  pink  bars  indicate  the 

probabilities  of  acceptance  and  rejection. 


p(x|z).  As  such,  a  nonstandard  p(x|z)  can  present  a  substantial  difficulty  for  MC 
integration.  We  consider  this  problem  next. 

One  of  the  main  tasks  of  any  Monte  Carlo  method  is  the  generation  of  independent 
samples  from  a  target  distribution  such  as  p(x|z).  In  general,  it  is  not  possible  to  draw 
these  samples  directly.  Instead,  one  of  three  indirect  approaches  is  typically  used.  We 
briefly  introduce  each  as  a  means  of  motivating  the  particle  filtering  algorithm.  We  will 
find  that  these  indirect  samples  are  either  dependent  or  drawn  from  some  distribution 
other  than  p(x|z)  [48].  We  will  use  7t(x|z)  ^p(x|z)  to  denote  a  proposal  distribution 
from  which  we  are  able  to  draw  samples.  Also,  for  all  three  sampling  techniques,  we 
assume  thatp(x|z)  can  be  evaluated  for  any  x. 

4.1.1  Rejection  sampling 

For  a  given  7r(x|z),  suppose  that  we  can  find  a  constant  C  such  that 

C'tt(xIz)  >  p(xjz) 

for  all  x.  Then,  the  rejection  sampling  algorithm  is  as  follows  [49]: 

1.  Draw  x1-'-1  from  7r(xjz)  and  compute  the  ratio 

r  P(x(i)  jz) 

C7t(xW  |z)  — 

2.  Draw  u  ~  U[o,i] ,  where  U[o,i]  is  the  uniform  distribution  on  [0, 1]. 

3.  If  u  <  r,  accept  x^;  otherwise  reject  it. 

It  can  shown  that  the  accepted  samples  follow  the  distribution  p(x|z).  A  simple  exam¬ 
ple  involving  a  true  distribution  with  support  {x  :  p(x\z)  >  0}  =  [0,  xmax]  is  shown  in 
Figure  4.1.  The  proposal  distribution  is  taken  to  be  uniform,  -k(x\z)  =  U[oja:)not](x). 
In  this  case,  a  sample  ~  U[ojaw]  is  proposed  and  accepted  with  probability 
r  =  xmaxp{x{^\z)IC. 
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In  the  general  case,  the  probability  of  acceptance  is 


Pr(accept)z)  =  J  Pr(accept|x,  z)  7t(x|z)  dx  = 


(4.5) 


Thus,  as  C  grows,  rejection  sampling  becomes  increasingly  inefficient  at  generating 
samples  from  p(x|z).  For  C  to  be  chosen  as  conservatively  as  possible,  the  maximum 
value  of  p(x|z)/7r(x|z)  must  be  found.  Because  p(x|z)  is  nonstandard,  this  may  be  a 
challenging  nonlinear  optimization  problem  in  itself.  On  the  other  hand,  if  we  heuristi- 
cally  increase  C  to  ensure  that  it  is  larger  than  the  (unknown)  maximum,  the  efficiency 
of  the  algorithm  will  suffer.  Overall,  because  of  the  relative  inefficiency  indicated  by 
(4.5),  rejection  sampling  is  unsuitable  for  real-time  applications  such  as  radar  that  re¬ 
quire  samples  fromp(xjz)  at  each  scan. 


4.1.2  Metropolis-Hastings  algorithm 

The  Metropolis-Hastings  algorithm  was  suggested  by  Metropolis  et  al.  [50]  and  later 
extended  by  Hastings  [51].  The  main  idea  behind  the  Metropolis-Hastings  algorithm 
is  to  simulate  a  Markov  chain  in  the  state  space  of  x  so  that  the  stationary  distribution 
of  the  chain  is  p(x|z).  Note  that  in  Markov  chain  analysis,  one  is  typically  given 
a  transition  function  and  asked  to  find  the  corresponding  stationary  distribution.  With 
the  Metropolis-Hastings  algorithm,  we  consider  the  inverse  problem:  given  a  stationary 
distribution,  find  a  transition  function  that  reaches  this  equilibrium  point  efficiently. 
Starting  with  any  initial  sample  x'°),  the  Metropolis-Hastings  algorithm  proceeds  as 
follows  [46]: 

1.  Draw  x  ~  7r(jx(*()  and  compute  the  ratio 


r(x,x^)  =  min 


p(x|z)7r(xW|x)  1 
p(xM  |z)  7 r(x  |xM)  J 


2.  Draw  u  ~  U[o,i]- 

3.  If  u  <  r(x,x (*)),  x(*+1)  =  x;  otherwise  x(l+1)  =  xW. 

Note  that  for  x  ^  x,  the  actual  transition  function  of  the  algorithm  is  n(x  jx)r(x,  x) 
(i.e.,  the  proposal  probability  times  the  acceptance  probability).  The  only  serious  re¬ 
striction  on  our  choice  of  the  proposal  distribution  is  that  7t(x|x)  >  0  if  and  only 
if  7t(x|x)  >  0.  It  can  be  shown  that  the  chain  induced  by  the  Metropolis-Hastings 
algorithm  is  reversible 1  and  hasp(xjz)  as  its  invariant  distribution  [46], 

We  notice  that,  unlike  rejection  sampling,  the  Metropolis-Hastings  algorithm  does 
not  require  the  maximization  ofp(x|z)/7t(xjz).  In  fact,  because  p(  jz)  appears  in  both 
the  numerator  and  denominator  of  r  (x ,  x),  we  only  need  to  evaluate  it  up  to  a  normal¬ 
izing  constant.  Nonetheless,  the  Metropolis-Hastings  algorithm  has  two  drawbacks 

1  In  the  Markov  chain  literature,  chains  that  satisfy  the  detailed  balance  condition  are  referred  to  as 
reversible.  Detailed  balance  is  a  condition  that  ensures  invariance  with  respect  to  the  transition  function 
7r(x]x)r(x,x). 
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that  make  it  unsuitable  for  many  real-time  applications.  First,  it  typically  requires  a 
substantial  burn-in  period  (possibly  on  the  order  of  thousands  of  samples)  before  the 
chain  reaches  its  stationary  distribution  and  samples  can  be  collected.  Even  more  trou¬ 
bling,  though,  is  that  the  resulting  samples  are  clearly  dependent.  In  order  to  reduce 
the  correlation  between  consecutive  xW,  researchers  sometimes  choose  to  keep  every 
nth  sample  (e.g.,  {x(50(,  x^100),  x^150), . . .}  for  n  =  50). 

4.1.3  Importance  sampling 

Thus  far,  we  have  considered  two  approaches  to  generating  samples  from  an  arbitrary 
distribution.  Rejection  sampling  is  typically  inefficient  and  presumes  knowledge  of 
maxx{p(x|z)/7r(x|z)}  while  the  Metropolis-Hastings  algorithm  requires  a  substan¬ 
tial  burn-in  period  and  generates  dependent  samples.  As  such,  both  techniques  are 
unsuitable  for  real-time  applications.  In  this  section,  we  introduce  a  third  approach, 
importance  sampling  [52],  and  demonstrate  that  it  is  well-matched  to  our  intended 
application.  In  the  next  section,  we  will  introduce  the  sequential  extension  of  the  im¬ 
portance  sampling  algorithm,  which  is  equivalent  to  particle  filtering. 

Recall  that  rejection  sampling  required  knowledge  of  a  constant  C  such  that  C  > 
p(xjz)/-7r(x|z)  for  all  x.  Importance  sampling  has  no  such  constraint;  instead  it  merely 
requires  that  7r(x|z)  >  0  whenever  p(xjz)  >  0.  This  requirement  is  necessary;  oth¬ 
erwise,  there  would  be  regions  of  the  state  space  where  p(x|z)was  nonzero  that  could 
not  be  “reached”  by  samples  drawn  from  7r(x|z).  With  this  assumption,  the  importance 
algorithm  can  be  motivated  by  simply  rewriting  the  integral  from  (4.1), 

1(a)  =  [  g(x)  7r(x|z)  dx  (4.6) 

J  7r(x|z) 

=  E7r[gr(x)^A7*(x)|z],  (4.7) 

where 

ru*(x)  =  p(x|z)/-7r(x|z),  (4.8) 

and  [- |z]  is  the  expectation  taken  with  respect  to  7r(x|z).  By  design,  it  should  be 
easy  to  sample  from  the  proposal  distribution.  Thus,  after  drawing  Np  independent 
samples  {x(1),  x^2\  . . . ,  according  to  7r(x|z),  the  expectation  in  (4.7)  can  be 

approximated  via  MC  integration  as 


p  ■ 

1  1=1 


(4.9) 


where  w*^  is  shorthand  for  w*(x.^).  The  set  {w;*^1),  w*(2\  . . . ,  w*(jVp) }  is  referred 
to  as  the  importance  weights.  An  explicit  definition  follows  from  (4.8); 


*(g  a  P(x(i)|z)  =  p(zlxW)p(x^) 

7r(xW|z)  p(z)  7r(x(*)  |z) 


(4.10) 


34 


where  we  applied  Bayes’  rule  to  rewrite  p(x W  |z).  Just  as  in  (4.2),  the  estimate  I*N  ( g ) 
is  unbiased.  Furthermore,  if  the  variance  of  g(x)  with  respect  to  7t(xjz)  is  finite, 

( 9 )  can  be  shown  to  converge  with  probability  one  to  I ( g )  as  Np  — >  oo  by  the 
strong  law  of  large  numbers.  Unfortunately,  there  is  a  practical  difficulty  with  (4.10) 
that  must  be  addressed.  So  far  in  this  chapter,  we  have  assumed  that  p(xjz)  could  be 
evaluated,  even  if  it  was  impossible  to  draw  samples  from  it.  For  the  types  of  stochastic 
models  discussed  in  Chapter  3  (i.e.,  Markov  state  sequences  observed  through  mem¬ 
oryless  channels),  this  is  generally  not  the  case.  While  it  is  usually  straightforward  to 
evaluate  the  likelihood  p(zjx)  and  the  prior  p(x)  for  such  models,  the  calculation  of 
the  normalization  term  p(z)  in  the  denominator  of  (4.10)  is  often  intractable. 

In  order  to  avoid  the  need  to  evaluate  p( z),  we  use  Bayes’  rule  to  rewrite  (4.6). 


where 


,<9)  =  ^)/9<x) 


P{ 


p(z\x)  p(x) 
7t(x|z) 


7r(x|z)  dx 


r(x|z)  dx 


ET[g(x)tu(x)|z] 


E7r[uj(x)|s 


(4.11) 

(4.12) 

(4.13) 


w(x)  =  p(z\x)  p(x) /n(x\z) 


(4.14) 


can  be  computed.  We  see  that  the  normalization  term  p( z)  from  (4.11)  has  been  ex¬ 
panded  as  the  integral  in  the  denominator  of  (4.12).  In  this  case,  a  set  of  Np  indepen¬ 
dent  samples  drawn  from  7t(x|z)  can  be  used  to  estimate  both  expectations  in  (4.13), 
thereby  yielding  a  new  approximation  for  I  ( g ). 

Np 

X>(x«)idW  (4.15) 

i=  1 

where  w W  is  shorthand  for«;(xW)  and 


InM~ 


±y:=i9{x^)W 


(*) 


N, 


wW  = 


w 


(«) 


Eg  iwij) 


tu(xW) 

Egi^(x«)' 


(4.16) 


From  (4.16),  we  notice  that  the  {«;W}  are  normalized  to  sum  to  one.  To  avoid  con¬ 
fusion,  from  here  forward  we  shall  refer  to  as  the  “true”  importance  weights 

and  the  and  {'tE-1 }  as  the  unnormalized  and  normalized  importance  weights, 

respectively.  In  addition,  all  references  to  importance  sampling  will  imply  ijv -  ( g )  in 
(4.15),  unless  otherwise  noted  (as  opposed  to  1%  (g)  in  (4.9)).  Because  Inp{9)  is  a 
ratio  of  estimates,  it  will  typically  be  biased.  However,  the  strong  law  of  large  numbers 
still  applies,  and  thus  Inp  (i g )  — >  1(g) ,  with  probability  one,  as  Np  ->  oo.  Comparing 
(4.9)  and  (4.15)  reveals  that  the  unknown  true  importance  weights  are  being 
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approximated  by  {Npw^  }  in  lNp(g)- 

Before  proceeding,  it  is  insightful  to  rewrite  the  approximation  ijv  ( g )  provided 
by  importance  sampling  in  a  slightly  different  way.  Denoting  the  Dirac  delta  measure 
<S(x-x«)  as  <5x(i)  (x),  we  can  express  (4.15)  as 


N  N 

Inp  (g)  =  ^  (J  g(x)8xw(x)dx  \  w{l)  =  f  g(x)  ^  (x)) 

i—1  '  '  i=  1 


dx.  (4.17) 


Then,  defining  the  empirical  (random)  measure  p,v;j  ( x  |  z )  generated  by  the  samples 
{x(*)}  drawn  from  7r(xjz)  as 


N„ 


PNp(x  |z)  =  ^WJWdx(i,(x), 


*.  .1 


(4.18) 


we  can  relate  (4.1)  and  (4.17)  as 

lNp{g)  =  /p(x)K(x|z)d,«  / s(x)p(x|z)dx,  (4.19) 

where  the  approximation  improves  as  ATP  — >  oo.  Equation  (4.19)  is  quite  useful  be¬ 
cause  it  indicates  the  sense  in  which  importance  sampling  can  be  considered  a  density 
estimation  technique.  Thus,  we  can  think  of  the  outcome  of  the  importance  sampling 
algorithm  as  either  an  MC  approximation  for  the  integral  f  g(x)p(x\z)dx  or  an  empir¬ 
ical  estimate  p^p(x\z)  of  the  posterior  p(x|z).  As  it  happens,  both  are  different  sides 
of  the  same  coin.  Because  the  focus  of  Bayesian  filtering  is  the  posterior  p(x|z),  we 
will  adopt  the  latter  interpretation. 

In  summary,  the  importance  sampling  algorithm  provides  us  with  a  means  of  con¬ 
structing  an  empirical  approximation  to  the  true  posterior,  without  requiring  the  ability 
to  sample  from  p(xjz).  Unlike  rejection  sampling  and  the  Metropolis-Hastings  algo¬ 
rithm  where  {x^}  are  distributed  according  to  the  true  posterior,2  the  samples  pro¬ 
duced  by  importance  sampling  are  distributed  according  to  the  proposal  distribution 
7r(x |z) .  However,  with  the  calculation  of  the  normalized  importance  weights 
the  importance  sampling  algorithm  yields  an  empirical  estimate  ofp(x|z),  as  indicated 
in  (4.19).  Furthermore,  importance  sampling  is  the  only  technique  of  the  three  that  is 
appropriate  for  real-time  applications  because  it  is  efficient  (i.e.,  all  samples  are  kept) 
and  does  not  require  a  lengthy  burn-in  period.  Having  established  the  fundamentals  of 
importance  sampling,  we  now  need  to  extend  the  algorithm  for  the  sequential  estima¬ 
tion  of  densities  p(xfc  |zi:*)  for  k  =  1,  2, . . .,  as  opposed  to  the  single-state  formp(xjz) 
that  we  have  been  considering. 

-  Mot  e  specifically,  for  rejection  sampling,  samples  that  are  accepted  are  distributed  according  top(x  jz). 
Likewise,  for  the  Metropolis-Hastings  algorithm,  samples  drawn  after  the  initial  burn-in  period  are  dis¬ 
tributed  according  to  p(x  jz) . 
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4.2  Sequential  Importance  Sampling 


In  this  section,  we  introduce  the  sequential  extension  of  the  importance  sampling  algo¬ 
rithm.  The  sequential  importance  sampling  (SIS)  algorithm  will  allow  us  to  approxi¬ 
mate  posterior  densities  of  the  form  p(x/,.  |z  ( as  required  for  Bayesian  filtering.  For 
the  SIS  algorithm  to  be  recursive,  we  will  have  to  constrain  the  form  of  the  proposal 
distribution.  The  following  discussion  draws  upon  material  from  the  tutorial  by  Aru- 
lampalam  et  al.  [36]. 

In  Section  4.1.3,  we  showed  how  importance  sampling  can  be  used  to  generate 
a  random  measure  pjvp(xjz)  that  approximates  the  true  posterior.  From  (4.18),  we 
see  that)ijvJ,(x|z)  is  completely  specified  by  the  set  {x^,  ruW}^,  where  is  the 
normalized  importance  weight  corresponding  to  xW.  Thus,  pn  (x|z)  consists  of  a  set 
of  (random)  support  points  and  their  associated  weights.  In  the  context  of  Bayesian 
filtering,  each  support  point  is  actually  a  randomly  generated  sequence  of  states,  which 
we  denote  Xq.};.  In  this  case,  our  random  measure  pNp(^-0:k\zi:k)  is  specified  by  the 
set  {x^,^}^,  where  the  unnormalized  weights  introduced  in  (4.14)  now  satisfy 


w 


(i)  _  P(Zl  =  fcl4l)P(Xo:fc) 


Si) 


(4.20) 


Thus,  upon  receipt  of  the  current  measurement  z*,  the  sequential  extension  of  the  im¬ 
portance  sampling  algorithm  must  transform  the  random  measure 
into  {xq!  j.,  w'^  .  If  the  proposal  distribution  is  chosen  to  factor  as 


7t(x0:fc|zl:fc)  =  7T (Xfc  |xo:fc_l ,  Zi:fc )  7r(xo:fc_l  |zi:fc_i  )  ,  (4.21) 


then  a  sample  x^  ~  7r(xo :k  can  be  obtained  by  the  following  simple  procedure: 

1.  Draw  x^  ~  ^(x^x^-pZi^). 

2  Set  x^  —  •fx*'8'*  xWI 
z.  oeLx0.fc  —  ^x0:fc_1,xfc  j-. 

Without  the  factorization  in  (4.21),  drawing  a  sample  from  n(xo-.k  \zi-.k)  would  require 
that  the  entire  previous  state  history  xru.-  i  be  resampled.  Clearly,  as  k  grows,  this  will 
become  infeasible  for  any  real-time  application.  However,  with  the  factorization  in 
(4.21),  we  only  need  to  draw  Np  state  vectors  during  each  measurement  update.  This 
factorization  of  the  proposal  distribution  also  simplifies  the  form  of  the  importance 
weights.  Substituting  (4.21)  into  (4.20)  yields 


p(z  1: 


k  x. 


)p('ril) 


O-.k 


r(xil)| 


0) 

L0:fc 


_1>  zl:k)  tt(x0:fc_l  |zl:fc-l) 


P(Zfclxfc))p(xfc)|xfc-i) 


P(zl:k  —  1  |Xq^_1  )  P(x(/fc  — 1  ) 


P(ZfclXfc))p(xt)|xl-,I1) 

7r(xfc)|xo:i_i,Zi:fc) 


•  w 


( * ) 

*-l! 


(4.22) 

(4.23) 

(4.24) 
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where  we  used  the  Markov  property  of  xo:k  and  the  conditional  independence  of  Zi:k 
given  xo-.k  to  go  from  (4.22)  to  (4.23). 

Because  we  wish  to  perform  recursive  Bayesian  filtering,  the  proposal  distribution 
in  the  denominator  of  (4.24)  must  be  modified  further.  First,  because  we  do  not  wish 
to  store  the  previous  measurement  sequence  zi-.h-i,  the  proposal  distribution  should 
be  redefined  as  conditional  on  the  current  measurement  z k  only.  Second,  because  we 
are  interested  in  obtaining  an  estimate  for  the  filtering  density  p(xk\zi-k),  it  would 

be  preferable  if  we  did  not  have  to  store  the  entire  previous  state  history  Xq.^_1  in 

(i)  (i) 

order  to  draw  xk  .  This  implies  dropping  the  conditioning  on  x0.k_2  in  the  proposal 

distribution  of  (4.24).  Applying  these  two  modifications  recursively  to  7r(xo:fc|zi:fc) 
yields  the  proposal  distribution 

7t(xS|zi:fc)  =  Ti-(xW)  J]  TrtxWlxg^).  (4.25) 

3= 1 


The  unnormalized  importance  weights  {'(('I/'1 }  from  (4.24)  then  assume  the  following 
simple  form: 


w 


W  - 


w 


k- 1 


(4.26) 


Because  the  proposal  distribution  i r(x.k  •  z )  is  no  longer  conditioned  on  the  past 

measurement  history  zi:*_i,  Equation  (4.26)  is  suitable  for  recursive  implementation. 
As  such,  it  is  the  form  of  the  SIS  weights  used  most  often  by  practitioners.  However, 
regardless  of  whether  (4.24)  or  (4.26)  is  used  for  {wk  'i },  the  SIS  algorithm  yields  the 
following  empirical  estimate  of  the  filtering  density: 


Np 

PjVj,(xfc|zi:fe)  =  ^2w{’\ 5xd)(xk)  &p(xkjz1:k),  (4.27) 

i=  1 

where  the  approximation  is  in  the  sense  indicated  in  (4.19),  and  {u)W ).  are  the  normal¬ 
ized  importance  weights.  Although  we  will  present  a  crucial  improvement  to  the  SIS 
algorithm  in  the  next  section,  we  summarize  our  development  thus  far  by  the  pseudo¬ 
code  description  given  in  Figure  4.2.  The  code  highlights  the  simplicity  of  implementa¬ 
tion  that  has  made  the  SIS  algorithm  such  a  popular  means  of  performing  (approximate) 
recursive  Bayesian  filtering. 


4.3  Degeneracy  of  the  SIS  Algorithm 

In  the  last  section,  we  introduced  the  sequential  extension  of  the  importance  sampling 
algorithm.  Although  conceptually  we  now  have  a  viable  statistical  approach  for  ap¬ 
proximating  a  recursive  Bayesian  filter,  the  SIS  algorithm  has  a  significant  practical 
shortcoming.  We  present  this  shortcoming  in  the  following  lemma. 
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Algorithm  1  Sequential  Importance  Sampling 

{> 


,wl 


=  s  1 S  [{xi*li ,  «4-i  }^1 ,  Zfc 


•  FOR  i  =  1  :  Np 

-  Draw  ~  n(xk  Ix^,  zk) . 

(i) 

-  Calculate  wk  according  to  (4.26). 

•  END  FOR 


Figure  4.2:  Pseudo-code  description  of  the  sequential  importance  sampling  algorithm. 


Lemma  3  Degeneracy  of  Importance  Weights 

The  unconditional  variance  of  the  true  importance  weights  cannot  decrease  with  time, 

Var5r(x0;*_1,Z1;j,_i)[wJ!_l]  <  raMxo:fe,Z1:fc)K]>  (4'28) 

where  the  subscripts  on  the  variances  are  meant  to  highlight  that  the  expectations  are 
taken  with  respect  to  both  the  measurement  sequence  and  the  imputed  state  sequence. 
Furthermore,  using  the  variance  decomposition 

varTr(x0:h,z1:k)  [tffc]  =  Ep(.1;k)  [varT(X0:fc|z1:fc)Klzi^]]  (A.29) 

in  (4.28)  indicates  that,  for  a  given  (nonrandom)  sequence  z  | : ,  the  conditional  vari¬ 
ance  of  the  true  importance  weights  will  tend  to  follow  an  increasing  trend  ( although 
not  strictly  so). 

Proof: 

See  Appendix  A. 

Because  Wk  =  p(zi-i-)wl  (compare  Equations  (4.10)  and  (4.14)),  the  conditional 
variance  of  tij);  will  follow  the  same  increasing  trend  as  well.  Thus,  while  Lemma  3 
does  not  rule  out  the  possibility  that,  for  a  given  realization  Zi -j.,  var[u>fc  |zi:*]  could 
actually  decrease  with  k,  in  real  applications  this  is  never  found  [45, 53].  Instead,  it  is 
reported  that  after  a  few  iterations  of  the  SIS  algorithm,  most  of  the  normalized  impor¬ 
tance  weights  will  be  nearly  equal  to  zero.  This  is  detrimental  for  two  reasons.  First, 
the  quality  of  the  random  measure  i  (Xfc)'  as  we^  as  any  estimates  based 

upon  it,  can  be  expected  to  deteriorate  as  more  weights  become  equal  to  zero.  Second, 
from  a  computational  standpoint,  it  is  inefficient  to  expend  resources  calculating  sup¬ 
port  points  and  weights  {xj,^ ,  w  'k  '1 }  whose  contributions  to  estimates  produced  by  the 
filter  are  negligible.  This  is  known  in  the  literature  as  the  problem  of  degeneracy  [36]. 
In  this  section,  we  will  discuss  two  approaches  to  reducing  its  occurrence. 
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4.3.1  Choice  of  the  proposal  distribution 

When  we  introduced  importance  sampling  in  Section  4.1.3,  the  only  constraint  placed 
upon  the  proposal  distribution  7r(xo:fe|zi:fc)  was  that  its  support  must  contain  the  sup¬ 
port  of  the  true  distribution  p(xo;*|zi:fc).  When  we  extended  the  algorithm  to  a  sequen¬ 
tial  setting,  we  enforced  the  additional  constraint  that  the  proposal  distribution  should 
factor  as  7r(xo:& |zi:*)  =  7r(xo)  JljLi  7r(xjlxj-i;  zj)  f°r  each  Other  than  these 
two  requirements,  we  were  free  to  use  whatever  distribution  was  convenient.  Unfortu¬ 
nately,  the  SLLN  convergence  results  for  the  SIS  algorithm  apply  as  Np  — >  oc.  Thus, 
to  obtain  satisfactory  performance  for  finite  Np,  more  care  is  required  when  choosing 
7r(xo:fc  |zi:fc).  In  the  context  of  alleviating  the  degeneracy  of  the  SIS  algorithm,  a  sensi¬ 
ble  choice  of  n{xk  | x* __i ,  z* )  is  the  distribution  that  minimizes  the  conditional  variance 
of  the  importance  weights.3  This  optimal  proposal  distribution  is  given  in  the  following 
lemma. 

Lemma  4  Optimal  Proposal  Distribution 

The  proposal  distribution  that  minimizes  varn  [vj^  xj/1^ ,  z*]  is 

7r0pf(x*|x^ltJz*)  =p(xk  |x^li;Zfc).  (4.30) 

Proof: 

Beginning  with  (4.24),  we  have 


(i)  (i)  p(**lx*  )p(x*  lx*-t) 

(4.31) 

Wl  =  •  ...  ... 

^«p»(x*  lxi-i»z*) 

0)  P(xil)>z*lxi-i) 

(4.32) 

-  a.jVV  ^  '  A-  X 

k— 1  /  (i)  |  ( i )  \ 

p(xl  K-i’zk) 

=  wk-i  P(zfe  xi*li), 

(4.33) 

where  we  used  the  conditional  independence  o/  z;.  given  X/.  to  go  from  (4.3 1)  to  (4.32). 
Thus,  for  the  proposal  distribution  suggested  in  (4.30),  the  weight  wk  is  conditionally 
independent  of  the  actual  draw  of  the  current  state  xj^ ,  or  varVopt  [w^1  x[*i : ,  z*]  =0, 
as  stated. 

There  are  two  problems  with  the  optimal  proposal  distribution.  First,  it  requires  the 
ability  to  sample  fromp(xfc  |x^x ,  z*),  a  distribution  that  may  be  nonstandard.  Second, 
calculation  of  wk  as  specified  in  (4.33)  requires  that  we  evaluate  the  integral 

p(zfc|xili)=  yp(zA:|xfc)p(xfc|xfcii)rfxfc,  (4.34) 

a  task  that  may  be  analytically  intractable.  Similar  to  our  discussion  on  exact  recur¬ 
sive  Bayesian  algorithms  in  Section  3.1,  there  are  two  cases  when  the  use  of  nopt  is 

3Because  of  the  factorization  required  by  (4.25),  7r(x0:fc \zi:k)  is  entirely  specified  by  7r ( x /,  ]x^_i ,  z*.) 
and  7t(xq). 
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PC  zk  I  xk  ) 


P(  xk  I  xk-i  ) 


Figure  4.3:  Example  of  how  use  of  the  prior  p(xfc|x^1)  as  proposal  distribution  can 
lead  to  degeneracy  of  the  importance  weights. 


possible  [36].  The  first  is  when  is  from  a  discrete  finite  state  space.  In  that  case, 
the  integral  in  (4.34)  becomes  a  sum  and  sampling  from  ^(xfclxj^jZfc)  is  possible. 
The  second  case  occurs  for  dynamic  models  with  additive  Gaussian  noise  processes 
and  linear  measurement  equations.4  With  this  model.  Equation  (4.34)  can  be  solved 
analytically  because  both  terms  in  the  integrand  are  Gaussian.  Furthermore,  because 
the  measurement  process  is  linear,  p(xfc|x^1;Zfe)  is  also  Gaussian  and  can  be  sam¬ 
pled.  In  general,  such  analytic  evaluations  are  not  possible  for  nopt,  and  researchers 
in  the  field  have  spent  a  good  deal  of  effort  developing  approximations  to  the  optimal 
proposal  distribution  (see  [45]  for  an  overview).  Some  recent  approaches  include  the 
auxiliary  particle  filter  [54]  and  approximations  based  upon  the  unscented  transforma¬ 
tion  [55,56]  and  the  Metropolis-Hastings  algorithm  [57,58],  We  will  return  to  this 
topic  in  Section  4.3.4  when  we  discuss  the  auxiliary  particle  filter. 

Before  discussing  the  second  method  for  mitigating  the  problem  of  degeneracy,  we 
mention  one  more  possibility  for  the  proposal  distribution.  Although  it  may  be  far  from 
optimal,  a  popular  choice  among  practitioners  is  the  so-called  prior  distribution 

7r(-x.k\^k-i’zk)  =  Mxdxi-i)-  (4.35) 

Sampling  from  the  prior  is  often  straightforward.  For  example,  in  the  case  of  an  ad¬ 
ditive  noise  model,  x^  =  /fc(xj;_i)  +  u^,  sampling  from  p(x^ jxj,,^)  amounts  to 
sampling  from  the  noise  distribution  p(u}.).  Furthermore,  the  weight  update  equation 
assumes  an  even  simpler  form. 


w =  «4-i  P(zfclx£°)-  (4.36) 

Despite  these  benefits,  use  of  p{xh  Ixj^)  as  the  proposal  distribution  can  lead  to  poor 
performance.  Figure  4.3  demonstrates  why  this  might  be  so.  In  the  figure,  the  like¬ 
lihood  p( Zfc  |  Xfc)  is  much  more  peaked  than  p(x^  xjT^ ).  Thus,  if  we  were  to  use  the 
prior  as  the  proposal  distribution  for  this  case.  Equation  (4.36)  indicates  that  many  of 
the  resulting  samples  ’  could  have  negligible  weights.  However,  this  is  precisely 

4 This  is  the  standard  Kalman  filter  framework  except  the  relationship  between  the  previous  state  and  the 
current  state  is  allowed  to  be  nonlinear:  =  /*.(xfe_  i)  +  u&. 
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the  problem  we  were  trying  to  alleviate  by  appropriate  choice  of  7t(xfc  |x^i5  z*).  In 
words,  the  prior  as  proposal  distribution  is  suboptimal  because  it  does  not  take  the  cur¬ 
rent  measurement  z/.  into  account.  In  cases  where  the  variance  of  the  system  noise 
is  significantly  greater  than  the  variance  of  the  measurement  noise  (as  shown  in  Fig¬ 
ure  4.3),  the  prior  tends  to  be  a  poor  choice  for  the  proposal  distribution. 


4.3.2  Resampling 

The  second  method  for  reducing  the  effects  of  degeneracy  in  the  SIS  algorithm  is  re¬ 
sampling.  Recall  that  the  SIS  algorithm  provides 

N„ 

PJV„(xfc|zi:fc)  =  <5^(0  (xfc) 

i—1 


as  a  discrete  approximation  to  the  true  posterior  p(xj;  |zi  :/.j.  Resampling  can  be  thought 
of  as  drawing  Np  independent  samples  from  p^p  (x/,  |zi:*).  In  the  following  discussion, 
we  will  use  the  notation  xk'  to  denote  a  state  vector  before  resampling  and  xf'  to 
denote  a  state  vector  after  resampling.  The  resampling  procedure  can  then  be  described 
as  mapping 


/ . (i)  ~(i)\Np  f  (i)  1  \Np 

\xfc  > wk  ).=1  \xfc  ’  jV  Ji=: 


(4.37) 


such  that 


Pr  (x^  =  xj^j  =  w^. 


(4.38) 


Because  resampling  draws  (in  an  approximate  sense)  from  the  true  posterior,  the  resam¬ 
pled  importance  weights  are  uniform,  as  shown  in  (4.37).  Thus,  resampling  prevents 
the  SIS  algorithm  from  degenerating  by  constructing  a  new  random  measure  where  all 
support  points  have  weight  1/-/Vp.  Equation  (4.38)  indicates  that  resampling  tends  to 
multiply  those  states  with  significant  weights  while  discarding  those  with  negligible 
weights.  In  this  sense,  resampling  can  be  considered  as  focusing  the  “attention”  of  the 
SIS  algorithm  on  the  most  promising  areas  of  the  state  space. 

In  order  to  know  when  to  resample,  we  need  a  way  of  monitoring  the  degeneracy 
of  the  SIS  algorithm.  In  [53],  Kong  et  al.  suggested  the  effective  sample  size,  Nejf,  as 
a  measure  of  degeneracy.  To  arrive  at  their  definition  for  Nejf,  we  first  introduce  the 
efficiency  ratio 


A  var7r[^jVi,(g)lzl:fc] 


vtap[INp(g)\z1:k] 


(4.39) 


where  lNp{g )  is  the  estimate  of  the  integral  f  g(xk)  p(xk\zi:/.)  dx^  obtained  by  im¬ 
portance  sampling  (see  (4.15)).  Similarly,  Inp  (g)  is  the  Monte  Carlo  estimate  of  the 
same  integral,  except  the  samples  are  drawn  from  the  true  posterior  (see  (4.2)).  As 
such,  rjk  (tt,  p)  is  a  measure  of  the  inefficiency  caused  by  not  being  able  to  sample  from 
p(xfc |zi:fc).  It  is  reasonable  to  expect  the  conditional  variance  var^fj/Vj, (g)\zi:k]  to 
increase  as  the  importance  weights  degenerate. 

Ideally,  we  would  like  to  define  the  effective  sample  size  as  the  ratio  Np/r]k(Tt,p), 
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but  the  efficiency  ratio  is  typically  unknown.  Instead,  we  use  the  following  result  from 
[48, 53]  to  approximate  %  (7 r,  p). 


Lemma  5  Approximation  for  Efficiency  Factor 
For  functions  g(x /. )  that  vary  slowly  with  x /, , 

v3rA^Np(g)\zy.k]  ~  varn[w*k\zi:k]  +  1 
varp[g\z1:k]  ~  Np 

Proof: 

See  Appendix  B. 

With  this  result,  we  have 


Np  =  _  var P[lNp(g)\zy.k\  =  varp[g\z1:k\  ~  Np 

Vk(Tt,P)  P  VarT[/JVj,(s)lzl:fc]  VarAlNp(g)\zi:k]  ^AWl\Zt-k]  + 1 

Thus,  as  an  approximation  to  Np/rjk(n,p ),  the  effective  sample  size  is  defined  as 


Neff  = 


Nv 


varJwt\z1:k]  +  1 


(4.41) 


There  are  two  practical  difficulties  with  the  use  of  Neff.  First,  as  discussed  in  Sec¬ 
tion  4.1.3,  we  generally  cannot  calculate  the  true  importance  weights  {u>^  }  because 
p(xo:fc|zi:fc)  can  only  be  evaluated  up  to  a  constant  of  proportionality.  Second,  even  if 
p(xo:fc|zi;fc)  can  be  evaluated,  a  closed-form  expression  for  var^wj*  |zi:*.]  is  typically 
not  available.  Instead,  recognizing  that  (var^u^lzi^]  +  1)  =  E„-[uij]!2|zi:fc],  we  use 
the  approximation  Npwk  '  «  wky '  (compare  (4.9)  and  (4.15))  to  estimate  the  effective 
sample  size. 


(4.42) 


where  {wk  > }  are  the  normalized  importance  weights.  As  a  heuristic  check  of  the  va¬ 
lidity  of  (4.42),  consider  the  two  extreme  distributions  for  {u)j^}.  In  the  best-case 
scenario,  the  normalized  weights  will  be  uniform,  =  1  /Np  for  all  i,  yielding 
Neff  =  Np.  In  the  worst-case  scenario,  all  but  a  single  weight  will  be  zero,  yielding 
Neff  =  1.  Thus,  we  find  that  Nejf  agrees  with  our  intuition  as  a  measure  of  the  effective 
sample  size.  With  Neg,  we  now  have  a  simple  yet  powerful  way  of  monitoring  the  de¬ 
generacy  of  the  SIS  algorithm.  After  each  measurement  update,  Nejf  is  calculated  and 
compared  to  a  predetermined  threshold,  which  we  denote  Nrhresh-  Resampling  occurs 

tf  N eff  <C  N thresh  ■ 

There  are  many  different  ways  to  implement  the  actual  resampling  operation  such 

(i) 

that  (4.38)  is  satisfied.  Define  Np  1  as  the  discrete  random  variable  corresponding  to 

~  ( i ) 

the  number  of  times  xjj, '  is  selected  during  resampling.  (We  suppress  the  subscript  k 
on  Wp*-*  for  convenience.)  Thus,  n}A  =  Np.  We  recognize  that  {Np  ^ }  obeys 

a  multinomial  probability  law.  As  such,  resampling  directly  from  the  probability  mass 
function  {«)(*(}  will  result  in  E[iVp*-)|w)(*i]  =  Npw W  and  var[Np'1  =  Npw^\l  — 
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Algorithm  2  Systemati 

c  Resampling 

K  ; 

TfJnJl  =  RESAMPLE  [{xf, 

• 

Initialize  CMF 

9-. 

3 

II 

o 

• 

FOR  i  =  1  :  Np 

-  Construct  CMF:  ^  +  wk\ 

• 

END  FOR 

• 

Draw  uq  ~  U[0  N-i 

j  and  set  j  =  1 . 

• 

FOR  i  =  1  :  Np 

—  Ui  =  Uq  +  (i  — 

1)  '  Np 

-  WHILE  Ui>  C 

(j) 

*  j  =  j  +  1 

-  END  WHILE 

-  Assign  xk  = 

-  T® 

~  Xfe  • 

(*  0 

-  Assign  wk  ■ 

=  1  /NP. 

• 

END  FOR 

Figure  4.4:  Pseudo-code  description  of  the  systematic  resampling  algorithm. 


u)(*))  for  each  i.  However,  as  mentioned  in  [59],  it  is  preferable  to  use  a  resampling 
scheme  that  reduces  the  variances  var[iVp*^|uiW].  This  should,  in  turn,  reduce  the 
chance  that  resampling  actually  increases  the  extent  of  degeneracy  by  selecting  too 
many  states  with  negligible  weights. 

In  our  particle  filter,  we  implement  a  scheme  known  as  systematic  resampling  [60]. 
The  algorithm  is  presented  in  Figure  4.4.  Note  that  {c^}^0 's  the  cumulative  mass 
function  for  the  discrete  density  corresponding  to  the  normalized  weights, 

i 

for  i  =  l,2,...,Np, 
j  =  l 

and  c(°)  =  0.  The  main  loop  in  the  algorithm  consists  of  comparing  a  value  u ,■  gen¬ 
erated  on  the  interval  [0, 1]  to  the  cumulative  mass  function  and  selecting  the  index 
miiij {  j  :  c(j>  >  Ui}.  It  is  easiest  to  understand  systematic  resampling  by  considering 
a  simple  example.  In  Figure  4.5,  an  illustration  is  provided  for  Np  =  3.  The  locations 
of  the  dashed  lines  indicate  that  the  states  {xjA  .  x^}  have  been  selected  by  the 
resampling  operation.  Thus,  the  particle  xj,3-*  has  been  multiplied  at  the  expense  of 
discarding  x^,  ,  which  had  a  lower  normalized  weight.  It  is  important  to  notice  that 
the  only  random  element  in  the  systematic  resampling  algorithm  is  the  draw  for  uq\ 
the  other  selection  points  {it,} -4",  are  deterministic  given  uq-  As  such,  we  have  the 
following  proposition. 
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Figure  4.5:  Illustration  of  systematic  resampling  for  Np  =  3.  The  locations  of  the 
dashed  lines  indicate  which  states  have  been  selected,  in  this  case  {x^ .  x^3-* ,  x^3-* }. 


Proposition  3  Reduced  Variation  of  Systematic  Resampling 

The  systematic  resampling  algorithm  yields  a  set  of  selection  counts  {Np  >  that 
satisfy  E[Np^  |u)W]  =  Npw W  and  var[Np'  \w^]  =  NpA^l\l  —  Np  A^)  for  all  i. 


where 


7VpAW  =  NpwW  - 


«;(*) 


(4.43) 


The  conditional  variance  in  (4.43)  is  guaranteed  to  be  strictly  less  than  the  conditional 
variance  of  the  multinomial  probability  law. 


Although  resampling  alleviates  the  effects  of  degeneracy,  it  does  introduce  two  new 
problems  [36,45].  First,  from  a  practical  point  of  view,  resampling  limits  the  ability  to 
parallelize  the  SIS  algorithm  because  all  weights  wk  must  be  summed  during  normal¬ 
ization.  Second,  from  a  theoretical  perspective,  the  samples  }  will  not  be  statis¬ 
tically  independent  after  resampling.  As  such,  we  lose  the  simple  convergence  results 
discussed  in  Section  4.1.3.  Nonetheless,  under  weak  assumptions,  it  is  still  possible  to 
guarantee  almost  sure  convergence  of  the  empirical  distributions  generated  by  the  SIS 
algorithm  toward  the  true  ones,  although  the  proofs  become  more  involved  [47,61],  For 
now,  we  highlight  the  problem  of  sample  impoverishment.  Sample  impoverishment  oc- 
curs  when  a  state  x],  (or  equivalently,  a  sequence  Xg/fc)  with  high  importance  weight 
is  selected  many  times  during  resampling,  leading  to  a  loss  of  diversity  among  the  sup¬ 
port  points.  Even  though  the  next  iteration  of  importance  sampling  “re-diversifies”  the 
support  set  {xj^  },  sample  impoverishment  can  still  have  a  detrimental  effect  if  we  are 
interested  in  fixed-lag  or  fixed-point  estimates.  Although  we  have  been  considering  the 
specific  case  of  filtered  estimates  thus  far,  it  is  straightforward  to  show  that,  for  L  >  0, 


(4'44) 


can  be  used  as  an  approximation  for  the  fixed-lag  expectation  Ep[g(x*_i)|zi:*],  as 
long  as  we  are  willing  to  maintain  a  partial  state  history 
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Figure  4.6:  Pseudo-code  description  of  a  generic  particle  filtering  algorithm. 


for  each  i.  In  this  case,  if  repeated  resampling  results  in  the  trajectories  xk_L.k  that 
all  share  a  few  common  ancestors  at  time  k  —  L,  then  we  expect  the  quality  of  the 
estimate  in  (4.44)  to  be  poor.  However,  in  spite  of  these  two  difficulties,  the  need  to 
prevent  degeneracy  of  the  SIS  algorithm  is  more  important,  and  resampling  has  become 
a  component  of  virtually  all  SIS  implementations. 

4.3.3  Generic  particle  filter 

With  Neff  as  a  monitor  of  degeneracy  and  systematic  resampling  as  a  corrective  mea¬ 
sure,  we  can  now  define  an  extension  to  the  SIS  algorithm.  Figure  4.6  contains  a 
pseudo-code  description  of  an  algorithm  that  incorporates  both  importance  sampling 
and  resampling.  This  algorithm  has  become  increasingly  popular  over  the  last  decade 
and  is  known  variously  as  particle  filtering  [62],  the  Bayesian  bootstrap  [11],  the  con¬ 
densation  algorithm  [13],  and  interacting  particle  approximations  [63].  In  this  report, 
we  will  adopt  the  term  particle  filter  for  the  algorithm  in  Figure  4.6,  as  its  usage  has  be¬ 
come  prevalent  in  the  literature.  Furthermore,  we  will  refer  to  the  support  set  {x^  }^!j 
as  “particles.”  In  Chapter  6,  we  will  show  how  the  particle  filter’s  flexibility  enables 
us  to  perform  recursive  Bayesian  filtering  for  the  challenging  task  of  RCS-based  joint 
tracking/classification. 
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4.3.4  Auxiliary  particle  filter 


According  to  the  pseudo-code  description  in  Figure  4.6,  resampling  will  occur  if  the 
particle  weights  become  too  skewed.  Although  resampling  results  in  a  uniformly 
weighted  sample,  it  also  leads  to  a  loss  of  diversity  (i.e.,  sample  impoverishment). 
As  such,  it  is  important  to  choose  a  proposal  distribution  that  provides  particles  with 
relatively  uniform  weights.  To  accomplish  this,  it  is  necessary  to  incorporate  the  cur¬ 
rent  measurement  zk  in  the  proposal  distribution.  In  our  case,  the  lack  of  a  closed-form 
expression  for  RCS  makes  this  difficult.  In  this  section,  we  will  describe  an  extension 
to  the  generic  particle  filtering  algorithm  that  allows  us  to  do  so.  The  auxiliary  par¬ 
ticle  filter  (APF)  was  originally  introduced  in  [54]  as  an  alternative  to  using  the  prior 
p(xfc|x*_i)  as  the  proposal  distribution.  Our  presentation  follows  that  of  [36].  Af¬ 
ter  providing  its  key  equations,  we  will  demonstrate  how  the  APF  can  be  used  in  our 
application. 

The  APF  operates  by  obtaining  a  sample  from  the  joint  density  p(xfc,  i  \zi:k),  where 
i  is  the  auxiliary  variable  that  represents  the  index  of  the  particle  at  scan  k  —  1  from 
which  Xfc  is  predicted  (i.e.,  x^,_1  — >  x^).  Using  Bayes’  rule,  we  can  express  this  joint 
density  as 


p{xk,i\z1:k)  <xp(zk\xk,$,z1:k-1)p(xk\i,z1:k-1)p{i\z1:k_1)  (4.45) 

=  p(zfc !  Xfc)p(xfc  Ix^Jwjjlr  •  (4.46) 


Next,  we  select  the  importance  function  to  satisfy  the  proportionality 


7r(xfc,i|zi:fc)  <x  p(zk\\<£))p(xk\x.(£)_1)w%l1,  (4.47) 


where  A^  is  some  characterization  of  xk  given  xj^ .  For  example,  A^  could  be  the 
conditional  mean,  A^  =  Efxfclxj^jJ.  With  these  definitions,  a  sample  (xjj^,*^) 
drawn  from  the  proposal  n(xk,  i \zi:k)  will  have  an  importance  weight  proportional  to 


P(^k\iU)\z1;k) 

n{*k\i(i)  \zi:k) 


p(zk\xlj)) 

p(zk\^k3)y 


(4.48) 


where  the  second  proportionality  is  obtained  by  taking  the  ratio  of  (4.46)  to  (4.47). 
The  weighted  sample  set  {xj^,  }  is  then  distributed  (approximately)  as 

p(xfc,  i\zi-k).  By  simply  discarding  the  auxiliary  variables  {*(■?)},  we  obtain  the  desired 
sample  fromp(xfc|zi:fc). 

At  first  sight,  it  might  not  be  apparent  how  this  alleviates  the  task  of  incorporating 
the  current  measurement  zk  into  the  proposal  distribution.  To  clarify  this,  we  factor  the 
proposal  as 


?t(xfc  >  i  |zl:fc  )  —  7t(t|zl:ft)7r(xfc  I*:  zl:fc) 

(p(z*|Ajj.’))u4’2i)  p(xk Ixj^J, 


(4.49) 

(4.50) 


a 
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Figure  4.7:  Pseudo-code  description  of  the  auxiliary  particle  filter  with  \k  ~ 


where  we  have  regrouped  the  terms  from  the  definition  of  n(xk,  i |zi:fc)  in  (4.47)  to 
match  the  factorization  in  (4.49).  Thus,  a  sample  (xj^ ,  i^)  is  generated  in  two  steps. 

1.  Draw  a  sample  of  the  previous  index:  P(zfcl4'Vi-t- 

2.  Given  draw  a  sample  of  the  current  state:  ~  p(x^  |xj/_x ). 

Thus,  we  now  see  how  the  APF  incorporates  the  current  measurement  into  its  proposal 
distribution.  Namely,  the  likelihood  p(zk  |  )  is  used  to  select  previous  samples  xj^ 

that  are  likely  to  lead  to  current  samples  that  are  well-matched  to  zk .  What  remains 
is  for  us  to  specify  Xk‘K  A  popular  choice  is  to  define  A^  as  a  sample  from  the  prior, 
A^  ~  ^xfclx^j.  With  this  choice  of  Xk\  we  obtain  the  pseudo-code  description  of 
the  APF  algorithm  provided  in  Figure  4.7. 

There  are  three  important  things  to  note  concerning  the  APF  implementation  in 
Figure  4.7.  First,  the  generic  particle  filter  routine  is  called  at  the  end  of  the  APF 
routine.  Thus,  when  A^  is  chosen  to  be  a  sample  fromp(xj;  jxj,/^),  each  APF  iteration 
is  actually  equivalent  to  two  iterations  of  the  generic  particle  filter.  This  is  because  the 
first  set  of  state  samples  is  discarded  after  the  auxiliary  variables  {i^}  have  been 
chosen.  Basically,  the  APF  performs  a  “dry  run”  with  the  current  data  to  see  which 
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of  the  previous  states  are  most  promising.  Then,  when  resampling  to  obtain  {i^}, 
previous  states  that  are  promising  are  reselected  many  times.  Those  that  do  not  match 

(j) 

Zfc  (when  predicted  to  the  current  time)  are  discarded.  This  way,  multiple  states  x), 
will  be  generated  from  each  promising  previous  state.  This  leads  us  to  our  second 
point.  The  APF  algorithm  essentially  performs  resampling  before  state  prediction  and 
weight  update,  as  opposed  to  the  generic  particle  filter,  which  resamples  after  these 
operations  have  occurred.  As  such,  sample  impoverishment  should  be  less  of  a  problem 
for  the  auxiliary  particle  filter.  Finally,  note  that  the  APF  with  p(*k  K'ij  only 

requires  the  ability  to  sample  from  the  prior  and  evaluate  the  likelihood  function  up  to 
a  constant  of  proportionality.  Therefore,  it  is  perfectly  suited  for  use  with  a  complex 
measurement  like  RCS. 

In  summary,  the  auxiliary  particle  filter  offers  the  ability  to  incorporate  the  current 
measurement  into  the  proposal  distribution.  Because  the  implementation  in  Figure  4.7 
requires  only  that  we  be  able  to  sample  from  the  prior,  RCS  can  be  incorporated  into  the 
data  vector.  In  Chapter  7,  we  will  investigate  the  trade-off  between  filter  performance 
and  computational  cost  for  the  APF  from  this  section  and  a  particle  filter  that  uses  the 
prior  as  proposal  distribution. 
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CHAPTER  5 


KALMAN  FILTER-BASED 
TRACKING 


In  this  work,  we  are  primarily  interested  in  formulating  a  joint  approach  for  FM-band 
passive  radar  tracking  and  classification.  In  Chapter  2,  we  showed  that  radar  cross 
section  has  considerable  potential  as  a  discriminatory  feature  in  the  VHF  band.  Unfor¬ 
tunately,  results  from  the  same  chapter  also  highlighted  the  complex  relationship  that 
exists  between  RCS  and  the  angular  aspect  of  the  target.  In  Chapter  3,  we  specified  that 
our  solution  should  be  both  recursive  and  (approximately)  Bayesian  optimal.  However, 
we  discovered  that  the  complex  nature  of  RCS  eliminated  most  popular  Kalman  filter- 
based  algorithms  from  consideration.  Finally,  in  Chapter  4,  we  found  that  sampling- 
based  algorithms,  such  as  the  particle  filter,  offered  the  flexibility  needed  to  incorporate 
RCS  within  the  measurement  vector. 

In  the  next  chapter,  we  will  define  both  the  state  model  /*.  and  the  measurement 
model  hj,  for  our  particle  filter.  Both  are  key  results  because  they  demonstrate  not  only 
that  tracking  and  classification  can  be  performed  jointly,  but  that  they  should  be  per¬ 
formed  jointly,  whenever  possible.  To  substantiate  our  claims,  extensive  experimental 
results  will  be  provided  in  Chapter  7.  However,  in  order  for  these  results  to  be  mean¬ 
ingful,  we  need  a  benchmark  against  which  performance  comparisons  can  be  made. 
That  is  the  subject  of  the  current  chapter. 

In  what  follows,  we  will  derive  an  EKF-based  filter  for  FM-band  passive  radar.  As 
discussed  in  Section  3.2.1,  the  lack  of  a  closed-form  relationship  for  RCS  prevents  it 
from  being  used  in  an  extended  Kalman  filter.  Because  this  leaves  delay  and  Doppler 
shift  as  measurements,  our  benchmark  system  will  have  tracking  capabilities  only.  In 
order  to  make  the  system  as  robust  as  possible  against  maneuvering  targets,  we  follow 
popular  practice  and  implement  it  using  the  Interacting  Multiple  Model  (IMM)  algo¬ 
rithm  from  Section  3.2.1.  Thus,  we  will  refer  to  our  benchmark  filter  as  the  IMM-EKF. 
In  the  following  sections,  we  define  the  state  models  }  for  the  IMM  algorithm  and 
the  measurement  model  /)  ;. .  In  addition,  because  the  filter  must  operate  in  the  presence 
of  clutter  and  missed  detections,  we  also  discuss  data  association. 
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5.1  State  Models  for  the  IMM-EKF 


As  presented  in  Section  3.2.1,  in  many  tracking  applications,  the  same  target  maneuvers 
in  different  fashions  at  different  times.  For  example,  an  airplane  that  is  flying  at  a 
constant  speed  and  fixed  altitude  may  suddenly  begin  to  climb  or  turn  sharply.  In  cases 
such  as  these,  it  is  unlikely  that  a  single  state  model  fk  will  accurately  represent  the 
target’s  behavior  at  all  times.  Instead,  a  jump  Markov  system  (JMS)  often  provides  a 
better  model  of  the  underlying  dynamics.  The  JMS  dynamic  system  was  defined  in 
Equations  (3.39)  and  (3.40).  In  this  section,  we  specify  the  state  model  for  the  JMS. 

For  our  application,  we  will  assume  a  jump  Markov  state  model  of  the  form 

xfc  =  F^k)xk_  i  +t47fc),  (5.1) 

where  {77,. }  is  the  Markov  chain  representing  the  maneuver  sequence.  The  terms  in 
{7fc}  take  values  in  {1, ... ,  TV),  where  N  is  the  number  of  separate  Kalman  filters  used 
in  the  implementation.  Given  the  previous  state  7*_i,  the  current  state  is  drawn  ac¬ 
cording  to  the  probabilities  in  the  Markov  transition  matrix  Pimm,  where  j )  = 

Pr(7fc  =  =  i)  f°r  any  k.  Note  that  (5.1)  is  a  linear  state  model,  whereas  (3.39) 

represents  the  more  general  nonlinear  case.  Furthermore,  we  will  assume  that  each  of 
the  N  noise  sequences  are  zero-mean  white  Gaussian,  u^7-*  ~  Q^)-  In  this 

case,  the  following  three  pieces  of  information  are  required  to  specify  our  state  model: 

1 .  the  number  of  models  (or  separate  extended  Kalman  filters)  N, 

2.  the  Markov  transition  matrix  Pimm, 

3.  the  parameters  {F^  for  the  models. 

For  our  IMM-EKF,  we  choose  N  =  3.  As  expected,  each  model  is  associated  with 
a  different  type  of  motion.  The  first  is  a  constant  velocity  (CV)  model,  the  second  is 
a  constant  acceleration  (CA)  model,  and  the  third  is  a  coordinated  turn  (CT)  model. 
This  is  a  popular  choice  of  models,  which  has  proven  to  be  robust  against  maneuvering 
targets  [40,64].  We  defer  specifying  values  for  the  probabilities  in  Pimm  until  Chapter  7. 
This  leaves  us  with  the  task  of  specifying  {F^\  }  for  each  7.  Before  doing 

so,  we  must  comment  on  the  general  nature  of  these  matrices.  Because  the  targets 
we  wish  to  track  are  free  to  climb  and  descend,  the  extended  Kalman  filters  must  be 
implemented  in  three  dimensions.  However,  it  is  often  assumed  that  motion  along  a 
given  coordinate  is  uncoupled  from  motion  along  the  other  coordinates.  If  we  assume 
that  the  components  in  x/.  are  grouped  according  to  their  coordinate  axes  (i.e.,  x,  y,  or 
z ),  the  matrices  F^  and  Q ^  then  assume  a  block  diagonal  form. 
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In  this  case,  it  suffices  to  specify  the  matrices  Fk  and  Q  k  for  a  single  coordinate. 
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5.1.1  Constant  velocity  model 


The  constant  velocity  (CV)  model  of  our  IMM-EKF  is  intended  to  represent  the  dy¬ 
namics  of  a  nonmaneuvering  aircraft.  Target  acceleration  is  modeled  as  zero-mean 
white  Gaussian  noise.  The  components  of  the  state  vector  are 

^  —  [Px,k  Vx,k  Py,k  Pz,k  Vz,k  ]  3  (5.3) 


where  px ^  and  vXtk  denote  position  and  velocity,  respectively,  along  the  x-axis  at 
scan  k.  The  components  for  the  y  and  2  axes  are  defined  in  a  similar  manner.  The 
state  matrix  and  noise  covariance  are  [64] 
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where  At  is  the  time  between  consecutive  scans  and  qu  '  -  is  a  tuning  parameter  that 
controls  the  strength  of  the  noise  for  the  CV  model.  The  value  for  q^cv^>  is  often 
determined  empirically,  on  an  application-specific  basis.  We  will  discuss  this  point 
further  in  Chapter  7. 


5.1.2  Constant  acceleration  model 

The  constant  acceleration  (CA)  model  of  our  IMM-EKF  is  intended  to  represent  the 
substantial,  but  transient,  accelerations  that  are  present  at  the  beginning  and  end  of  ma¬ 
neuvers  (e.g.,  the  transition  from  constant-velocity  flight  to  a  coordinated  turn).  Target 
acceleration  is  modeled  as  Brownian  motion.  The  components  of  the  state  vector  x/0, 
in  this  case,  are 

—  [Px,k  Vx,k  Q>x,k  Py,k  Q>y,k  Pz,k  ^z,k  Q>z,k  ]  3  (5.5) 

where  ax  denotes  acceleration  along  the  x-axis  at  scan  k,  and  the  other  components 
are  as  defined  in  (5.3).  The  state  matrix  and  noise  covariance  are  [64] 
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(5.6) 


where  q'  (  A)  js  a  tuning  parameter  that  controls  the  strength  of  the  noise  for  the  CA 
model.  Similar  to  q^cv\  the  value  for  q,(  A>  is  also  best  determined  empirically. 


5.1.3  Coordinated  turn  model 

The  CV  and  CA  models,  which  we  have  presented,  both  assume  that  the  motion  along 
each  coordinate  is  uncoupled  from  that  of  the  other  coordinates.  This  leads  to  block 
diagonal  matrices  of  the  form  shown  in  (5.2).  While  this  is  a  convenient  assumption. 
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Figure  5.1:  Example  of  IMM  coordinated  turn  maneuver. 


actual  target  maneuvers  produce  motion  that  is  highly  correlated  across  the  coordinate 
directions  [64].  The  coordinated  turn  (CT)  model  attempts  to  remedy  this  shortcoming. 

To  begin,  we  note  that  the  name  “coordinated  turn”  is  actually  a  bit  of  a  misnomer, 
although  its  use  is  prevalent  in  the  EKF  literature.  During  a  real  coordinated  turn,  what 
is  actually  being  “coordinated”  are  the  various  control  surfaces  of  the  airplane  in  order 
to  maintain  a  certain  angular  orientation  with  respect  to  the  velocity  vector.  Because 
delay  and  Doppler  measurements  provide  limited  information  about  target  orientation, 
EKF-based  filters  have  no  real  way  of  monitoring  whether  or  not  the  pilot  is  actually 
coordinating  a  turn.  In  the  next  chapter,  we  will  demonstrate  how  RCS  can  be  used 
to  define  an  aerodynamic  ally  correct  CT  model.  For  extended  Kalman  filters,  what  is 
actually  meant  by  a  coordinated  turn  model  is  a  constant-speed  circular  turn  model.  An 
example  of  this  sort  of  maneuver  is  given  in  Figure  5.1.  In  the  figure,  (ix,iydz)  are  unit 
vectors  directed  along  the  coordinate  axes  that  the  filter  is  using,  and  oj  is  the  angular 
velocity  vector.  The  turn  rate  of  the  maneuver  is  simply  the  magnitude  of  the  angular 
velocity,  w  =  ||u>||.  This  example  is  meant  to  illustrate  that,  while  the  maneuver  does 
occur  in  a  plane,  it  is  not  necessary  for  that  plane  to  be  horizontal  (i.e.,  parallel  to  the 
ixiy- plane).  Therefore,  if  the  maneuver  plane  is  vertical,  the  EKF-based  CT  model  can 
actually  be  used  to  represent  climbs  and  descents  as  well. 

_  // 'jrji \ 

We  now  develop  an  expression  for  Fk  .  While  the  definition  of  this  state  ma¬ 
trix  can  be  found  throughout  the  tracking  literature,  details  concerning  its  derivation 
are  seldom  provided.  For  this  reason,  we  will  summarize  how  Fj.  Ms  determined. 
To  begin,  in  a  coordinate  system  whose  xy-plane  coincides  with  the  maneuver  plane, 
velocity  and  acceleration  along  a  circular  path  can  be  expressed  as 
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where  v  is  the  target’s  constant  speed,  v  =  ||vfc||,  and  the  tildes  are  used  to  indicate 
that  Vfc  and  ak  are  not  expressed  in  the  (ix,iy4z)  coordinate  system.  Define  O  as  the 
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orthogonal  transformation  that  aligns  the  maneuver  coordinates  with  the  (ix,iy,i z)  sys¬ 
tem.  Then,  if  \k  and  a;,  are  the  velocity  and  acceleration  vectors  in  filter  coordinates, 
we  have 


Vfc+i  =  Ovk+ 1 


=  O- 


=  0-v 


-  sin(w(A:  +  l)Af) 
cos  (co(k  +  l)Ai) 
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=  O  •  cos(wAf)v*  -I - - — -a* 
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.  sin(wAf) 

=  cos(wAf)vfc  -1 - - - ak. 


CO 


(5.8) 


The  same  approach  can  be  used  to  find  expressions  for  p;.  (  |  and  a/.  (  i  in  terms  of  p k , 
Vfc,  and  a;,,  (where  p;;  is  the  position  vector  expressed  in  filter  coordinates).  Combining 
these  expressions  yields  the  desired  result. 
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Note  that  (5.8)  corresponds  to  the  second  row  in  (5.9).  Furthermore,  while  the  overall 

(C'T^  —  (CT) 

state  matrix  Fj,  ’  is  still  block  diagonal,  the  submatrices  Fk  '  (co)  are  now  coupled 

through  co.  This  is  precisely  what  we  wished  to  accomplish. 

_  (sjrr\ 

In  order  to  calculate  Fk  (co).  we  must  first  estimate  co.  For  a  planar  circular 
turn,  the  turn-rate  is  simply  co  =  ||a;,.  ||/t’.  In  general,  a  target  is  unlikely  to  execute  a 
perfectly  circular  turn.  As  such,  it  is  better  to  estimate  co  each  scan  by  first  estimating 
the  angular  velocity  vector. 


Vfc  x  ak 
Vfc  •  vfe 


cok  = 


IK  x  a*  | 


(5.10) 


To  complete  the  specification  of  our  CT  model,  we  note  that  the  components  of 
xj.6"7  -1  are  identical  to  those  of  ~1'1  given  in  (5.5).  In  addition,  as  a  third-order  model, 
Q has  the  same  form  as  Olf  ^  from  (5.6).  The  only  difference  is  the  scaling 


terms,  qFT)  and  q(CA) .  in  general,  qFT)  will  be  much  smaller  than  qFA)  because  the 
acceleration  causing  the  coordinated  turn  has  already  been  incorporated  in  Fk  ’Uco). 

_  ((JrT\  —  ((JA) 

Finally,  because  limw  -^o  Fk  '  (co)  =  Fk  ,  the  CT  model  can  also  function  as  a 


constant  acceleration  model  for  a  target  that  is  not  turning. 
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5.2  Measurement  Model  for  the  IMM-EKF 


In  this  section,  we  present  the  measurement  model  for  our  IMM-EKF.  The  measure¬ 
ment  equation  provided  by  the  jump  Markov  framework  is 

zk  =  (xk)  +  wk.  (5.11) 

Therefore,  we  must  specify  for  each  model  and  the  statistics  of  the  measurement 
noise  sequence  {w/. }.  Before  proceeding,  a  comment  must  be  made  about  the  use  of 
multiple  transmitters  in  our  application.  As  discussed  in  Chapter  2,  the  passive  radar 
scenario  is  inherently  bistatic  (or  multistatic,  if  multiple  transmitters  are  used).  For  our 
simulations,  we  will  assume  that  the  scan  times  for  each  transmitter  are  interleaved  (i.e., 
that  the  receiver  processes  one  FM  radio  signal  at  a  time).  Because  FM  radio  stations 
can  be  modeled  as  “always  on,”  we  are  free  to  process  the  scattered  signals  from  these 
illuminators  at  any  time  and  in  any  sequence.  Furthermore,  if  multiple  radio  signals  are 
received  simultaneously  (using  a  multichannel  receiver),  back-to-back  measurement 
updates  are  allowed  within  the  recursive  Bayesian  framework.  All  this  is  to  say  that  it 
suffices  to  consider  the  measurement  update  for  a  single  transmitter  at  a  time.  In  this 
case,  we  are  left  with  the  following  single-target  measurement  vector, 

z k  =  [n  dk]',  (5.12) 


where  77.  and  dk  are  the  delay  and  Doppler  shift  measurements,  respectively,  received 
at  scan  k. 1  Because  neither  of  these  quantities  (to  a  good  approximation)  is  a  function 
of  target  acceleration,  the  fact  that  xj7  '  !  and  xj,C  7  l  include  a*  is  inconsequential. 
Therefore,  the  measurement  function  will  have  the  same  form  for  each  7. 

We  begin  by  specifying  h^\  In  this  section,  we  consider  tracking  a  single  target 
in  a  clutter-free  environment.  The  issue  of  measurements  of  uncertain  origin  and  data 
association  will  be  discussed  in  the  next  section.  If  we  denote  the  position  and  velocity 
vectors  of  the  target  at  scan  k  as  p/.  and  V/,,  respectively,  then  '  for  each  of  the  IMM 
models  is  defined  implicitly  by 


Tk 

dk 


IPs  _  Ptfcll  +  Up*  _  p r 


V*  •  nth(pk) 


A 


tk 


(5.13) 

(5.14) 


where  pth  is  the  location  of  the  transmitter  whose  signal  is  being  used  during  scan  k, 
A tk  is  its  wavelength,  pr  is  the  location  of  the  receiver,  and  c  is  the  speed  of  light.  Note 
that  tk  maps  the  scan  index  into  a  transmitter  index.  Thus,  if  Ntx  is  the  total  number 
of  transmitters  used  by  our  system,  tk  £  {1, . . . ,  Ntx}  for  each  k.  In  (5.14),  fitk  (p&)  is 
the  inward  normal  at  p;,  of  the  ellipsoid  with  foci  (p tk ,  pr)  and  major  axis  length  77c:. 

We  assume  that  the  beamwidth  of  the  receiving  antenna  is  too  wide  to  supply  useful  angle  of  arrival 

data. 
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This  normal  can  be  evaluated  as 


fkkiPk) 


P  tk  P  k  Pr  Pfc 
llPtfc  —  Pfc  II  1 1  Pr-  P k 


(5.15) 


The  equations  for  Doppler  shift,  ("5. 14)— (5. 15),  take  this  somewhat  complicated  form 
due  to  the  bistatic  nature  of  our  application.  Because  Doppler  shift  is  proportion¬ 
ate  to  range  rate,  the  velocity  vector  V;,.  must  be  projected  onto  the  normal  of  the 
contour  of  constant  range.  In  the  bistatic  scenario,  this  contour  is  an  ellipsoid.  As 
a  check  of  (5.15),  consider  a  monostatic  radar.  In  this  case,  p tk  =  Pr-  yielding 
htk  (pk)  =  —  2iPh_Pr,  where  (pt_Pr  is  the  unit  normal  directed  (radially  outward) 
from  the  receiver  towards  the  target.  Because  v/,.  •  iPk_Pr  is  the  range  rate.  Equa¬ 
tion  (5.14)  then  yields  <4  =  —2Rk / A*fe ,  the  familiar  relationship  for  Doppler  shift. 

From  (5. 13)— (5.15),  we  see  that  the  relationship  between  x/,.  and  z/.  for  our  appli¬ 
cation  is  a  nonlinear  one.  This  is  why  it  is  necessary  to  use  extended  Kalman  filters  for 
our  benchmark  system.  As  such,  we  will  need  to  specify  the  Jacobian  matrix 


dftj7)(x) 


X  = 


.(7 

'fe|  fe  — 


(5.16) 


~  ('-y) 

Using  the  definitions  of  delay  and  Doppler  shift  from  (5. 13)— (5.15),  we  see  that  Hk 
has  the  following  form, 


'  drk 
dpx 

0 

0 

9rh 

dpy 

0 

0 

drk 

dpz 

0 

°1 

ddk 

ddh 

0 

ddk 

ddk 

0 

ddk 

ddk 

0 

-dpx 

dvx 

dpy 

dvy 

dpz 

dvz 

(5.17) 


where  the  nine-dimensional  state  vector  of  the  CA  and  CT  models  has  been  assumed 
(see  Equation  (5.5)).  For  the  CV  model’s  six-dimensional  state  vector,  X  '  would 
be  the  same  as  (5.17),  but  with  every  third  column  eliminated.  With  some  algebra,  the 
following  equations  for  the  partial  derivatives  can  be  obtained. 


drk 

dp 

ddk 

<9v 

ddk 

dp 


ntAPk) 

1 

c 

fkkiPk) 

K  ’ 

j_  ( (pfc  -pth)(pk  -ptky 

K  V  l|pfc-ptj|3 

,  (Pfc  —  Pr)(Pfc  —  Pr)  ,  T 

+ - - bk  h 


■  Vft, 


(5.18) 

(5.19) 


(5.20) 


where  I3  is  the  3x3  identity  matrix,  and  the  vectors  p;,  and  vfc  are  used  to  denote  the 
position  and  velocity  estimates  from  x^^ .  The  coefficient  bk  is  defined  as 


bk 


llPfc  ~  PtJI  +  llPfc  ~  Pr 
|Pfc  —  P tj,  ||  1 1 Pfc  —  Pr  1 1 


(5.21) 
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This  completes  our  specification  of  //  j.7 !  for  the  IMM  models. 

A  description  of  the  measurement  noise  sequence  is  much  more  straightforward. 
We  assume  that  {w^}  is  a  zero-mean  white  Gaussian  process,  w*  ~  0,  Rk )■  We 

also  assume  that  the  component  noises  for  a  given  scan  are  independent,  which  yields 


Rk 


(5.22) 


For  the  experimental  results  presented  in  Chapter  7,  ajk  and  will  not  vary  with  k. 


5.3  Data  Association  for  the  IMM-EKF 

In  the  previous  section,  we  defined  the  IMM-EKF  measurement  model  assuming  that 
only  a  single  target  was  being  tracked  in  a  clutter-free  environment  with  zero  prob¬ 
ability  of  miss.  In  this  simple  case,  z/.  contains  exactly  one  pair  of  delay-Doppler 
measurements  per  scan.  Furthermore,  this  single  pair  of  measurements  is  guaranteed 
to  have  originated  at  the  target.  A  more  realistic  scenario  would  include  multiple  tar¬ 
gets,  clutter  returns  (i.e.,  false  alarms),  and  a  nonzero  probability  that  one  or  more  of 
the  targets  would  go  undetected  in  any  given  scan.  In  this  case,  z/,.  could  contain  more 
or  less  than  a  single  pair  of  delay-Doppler  features. 

Data  association  is  the  process  of  dealing  with  the  uncertain  origin  of  each  mea¬ 
surement  in  this  case.  It  assigns  measurements  to  target  tracks  so  that  the  Kalman  up¬ 
date  can  occur.  Methods  of  data  association  can  be  grouped  into  two  broad  categories: 
those  that  perform  hard  assignments  and  those  that  perform  soft  assignments  [64]. 
Hard  assignment  techniques  attempt  to  determine  the  single  “best”  association  of  mea¬ 
surements  to  targets  for  use  in  the  Kalman  update  stage.  This  optimal  assignment  is 
determined  according  to  some  suitably  chosen  criterion  (e.g.,  maximum  likelihood). 
Soft  assignment  techniques  use  all  measurements  to  update  all  tracks,  but  the  influ¬ 
ence  of  each  measurement  varies  with  the  probability  that  it  was  produced  by  the  target 
under  consideration.  For  our  IMM-EKF  system,  we  will  use  joint  probabilistic  data 
association  (JPDA),  a  soft  assignment  algorithm  that  we  will  introduce  in  the  following 
sections. 

Although  we  will  consider  multitarget  tracking  at  the  end  of  this  chapter,  it  is  help¬ 
ful  to  begin  our  discussion  with  the  simpler  case  of  tracking  a  single  target  in  the  pres¬ 
ence  of  clutter  and  missed  detections.  Variants  of  the  following  results  can  be  found  in 
many  sources  [32,64-67], 

5.3.1  Probabilistic  data  association 

In  this  section,  we  introduce  the  mathematical  framework  needed  to  perform  proba¬ 
bilistic  data  association  (PDA)  by  considering  the  case  of  tracking  a  single  target  in  the 
presence  of  clutter  and  missed  detections.  In  order  to  simplify  notation,  we  will  sup¬ 
press  the  superscript  notation  used  earlier  to  indicate  which  of  the  IMM  models  was 
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being  considered.  However,  because  each  model  corresponds  to  a  self-contained  EKF, 
the  following  results  can  be  taken  to  apply  to  each  filter  within  our  IMM-EKF. 

We  begin  with  several  definitions.  Define  V/.  as  the  volume  of  the  measurement 
space  searched  during  scan  k  for  target  detections.  (Practical  constraints  prevent  the 
entire  measurement  space  from  being  considered.)  This  subset  of  the  measurement 
space  is  usually  centered  upon  hf.  )  and  referred  to  as  the  gate.  We  assume  that 

the  number  of  false  alarms  in  each  scan  is  Poisson-distributed  with  mean  3V k ,  where 
3  is  the  density  of  false  alarms  per  unit  volume.  We  follow  common  practice  and  as¬ 
sume  that,  conditioned  upon  the  number  of  false  alarms,  their  locations  are  distributed 
uniformly  across  the  gate  volume.  For  each  scan,  the  probability  that  the  measurement 
for  a  target  being  tracked  is  contained  within  the  gate  is  denoted  Pq.  The  probability 
that  the  measurement  is  detected,  conditioned  on  it  being  within  the  gate,  is  denoted 
Pd-  In  actuality,  the  probability  of  detection  is  a  function  of  the  round-trip  range  from 
transmitter  to  target  to  receiver  (see  Equation  (2.2)),  but  it  is  common  to  simply  assume 
that  P/j  is  constant.  If  we  denote  the  number  of  detections  in  scan  k  as  m  /0 ,  the  EKF 
measurement  vector  is  then 


^  =  <5-23> 
We  define  the  vector  of  components  from  a  single  detection  as 

zi3)  =  [TkJ)  dk]  ]  »  3  e  {1,  ■  ■  ■  ,mk},  (5.24) 

and,  where  it  does  not  lead  to  confusion,  we  will  refer  to  z;>  as  a  single  measurement 
(even  though  it  actually  contains  a  delay-Dopplerpair). 

We  will  use  to  denote  a  possible  association  of  measurements  from  z /.  to  tar¬ 
gets.  In  the  general  (multitarget)  scenario,  each  association  hypothesis  would  contain 
three  pieces  of  information:  (1)  the  number  of  false  alarms,  (2)  the  number  of  targets 
detected,  and  (3)  a  measurement  index  for  each  target  (hypothesized)  as  having  been 
detected.  The  third  item  identifies  which  measurement  goes  with  which  target  track.  In 
the  single  target  scenario,  this  description  simplifies  because  the  number  of  detections 
is  either  one  or  zero.  Thus,  there  are  nik  +  1  possible  association  hypotheses  per  scan. 
If  the  target  is  detected,  the  correct  measurement  can  be  any  one  of  the  in  z;,. .  If  the 
target  is  not  detected,  all  measurements  are  false.  When  considering  the  single  target 
scenario,  the  superscript  of  rjt‘ ’  will  have  the  following  meaning.  For  j  =  1, . . . ,  m;0, 
T[j)  will  denote  the  hypothesis  that  zj^  is  the  “correct”  measurement,  and  the  other 
rrifc  —  1  are  false.  will  denote  the  hypothesis  that  the  target  went  undetected,  and 
thus  all  mk  measurements  are  false.2  The  goal  of  this  section  is  to  find  expressions  for 
the  conditional  probabilities  Pr(T^  |zi:*),  j  =  0, 1,  „ . . ,  m^. 

2When  the  superscript  of  the  association  hypothesis  is  a  generic  index  with  no  implied  meaning,  we  will 
use  l  instead  of  j  (e.g.,  rj^). 
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Using  Bayes’  rule,  we  have 


Pr(rfc7)|Zl:fc)  =  Pr(rij)|Z*>Zl  =  *-l) 

OC  J3(z*|r£j),  zl:*-l)  Pr(r^|zi:*_i).  (5.25) 

Conditioned  upon  the  association  event  the  first  term  in  (5.25)  is  simply  the  prod- 
uct  oi  the  measurement  likelihoods  for  each  component  z p  .  To  simplify  notation, 
define  y as  the  innovation  for  the  jth  measurement, 

y[j)  -  zk]  ~  hk  (x*|*-i )  •  (5.26) 


Then,  because  the  measurement  noise  is  additive  Gaussian  and  the  false  alarms  are 
uniformly  distributed  across  the  gate. 


p(z*|r^\zi:*_i) 


j  V~(mk^P^Af( ;  0 ,Sk)  j  =  1, . . . ,  mk 

\vrk  j=o. 


(5.27) 


where  the  term  PG 1  accounts  for  the  fact  that  the  Gaussian  density  is  truncated  by  the 
gate.3  The  second  term  in  (5.25),  the  association  probability,  can  be  expressed  as 


Pr(r(j)|zi:fc-i)  =pr(pij))  (5.28) 

=  {^PDPGVF(mk-  1)  j  =  l,...,mk 
^(1  -  PDPG)/J-F(mk)  j  =  0, 

where  PdPq  is  the  probability  that  the  correct  measurement  is  in  zk  (i.e.,  that  the 
measurement  satisfies  the  gate  and  is  detected),  and  //  p  is  the  Poisson  mass  function 
for  the  false  alarm  count.  Note  that  T^  is  independent  of  Z\-k-\  because  the  number 
of  false  alarms  and  detections  are  modeled  as  independent  from  scan  to  scan.  (This 
assumes  that  both  3  and  Pn  are  known;  otherwise,  (5.28)  would  not  hold.) 

We  now  arrive  at  the  desired  result  by  substituting  (5.27)  and  (5.29)  back  into  (5.25) 
and  simplifying. 


pr(r!j)|z1:*) 


1  x  j  VkPDI z^k~1]  .30  j  = 

C  \  (!  -  PDPG)Mmk)  j  =  0, 


(5.30) 


where  we  absorbed  the  term  V^mh  into  the  normalization  constant  C.  The  expression 
in  (5.30)  can  be  simplified  even  further  by  replacing  —  1)  and  yu F(mk)  with 

their  actual  probabilities.  Because  / ip  is  a  Poisson  distribution  with  mean  BVk,  we 
have 


Hf  {rrik) 

3  Typically,  the  gate  is  chosen  such  that  Pq 
gate  is  negligible). 


{PVk)mk  e~PVk 
mk\ 


(5.31) 


ft!  1  (i.e.,  the  chance  that  the  true  measurement  is  outside  the 
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Substituting  (5.31)  into  (5.30)  yields 


PDJ^f(y[j);0,Sk)  j  =  i,...,mk 
(1  -  PDPG)  8  j  =  0, 


Pr(rf|z1:fc) 


(5.32) 


where  we  have  absorbed  terms  common  to  both  cases  into  the  normalization  constant. 
Because  the  events  lor  j  =  0.1.....  m.k  are  mutually  exclusive  and  exhaustive,  C 

can  be  calculated  as 

C=  (1  -  PdPg)/3  +  Pd^^Y^  ;0,Sk).  (5.33) 

3= 1 

Equation  (5.32)  is  the  result  that  we  wished  to  derive.  In  the  next  section,  we  will  use 
it  to  implement  the  probabilistic  data  association  filter. 


5.3.2  PDA  filter 

In  the  previous  section,  we  considered  the  case  of  tracking  a  single  target  in  clutter  with 
Pd  <  1.  The  main  result  was  an  expression  for  PrfT^'1  |zi:*),  the  conditional  proba¬ 
bility  that  the  jth  measurement  is  correct  (or  that  all  measurements  are  false  if  j  =  0). 
In  this  section,  we  will  specify  how  these  probabilities  can  be  used  to  perform  recursive 
filtering  in  the  presence  of  clutter  and  missed  detections.  We  begin  by  considering  the 
task  of  performing  data  association  across  multiple  scans. 

Given  a  sequence  of  scans  Zi:k,  define  T\l];  as  the  multiscan  association  event 

if:l  =  {  r^1 } ,  r^2  >  5 , . , ,  }  | ,  (5.34) 

where  each  index  ji  satisfies  ji  €  [0,  m,].  Note  that  T[l)k  is  simply  the  concatenation 
of  k  single-scan  association  events.  The  total  number  of  associations  through  k  scans 
is  then 

k 

Mk  =  Y[(mi  +  1),  (5.35) 

i=l 

which  grows  exponentially  with  k.  Because  the  association  events  are  mutually  exclu¬ 
sive  and  exhaustive,  the  posterior  distribution  of  x/.  can  be  expressed  as 

Mh 

p(xfc |zi;fc)  =  ^p(xfc|r|^,z1:fc)  Pr(r«|zi:*).  (5.36) 

i=i 

As  discussed  in  Section  3.2.1,  we  see  that  the  presence  of  clutter  and  missed  detections 
is  likely  to  result  in  a  decidedly  non-Gaussian  posterior.  The  conditional  mean  then 
takes  the  form 


1=1 


(5.37) 
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Thus,  in  the  presence  of  clutter  and  missed  detections,  the  conditional  mean  E  [x k  Z| :k] 
is  a  weighted  sum  of  the  expectations  E[x^  |r^,  zi:*]  taken  over  every  possible  mul¬ 
tiscan  association  hypothesis.  If  the  system  model,  when  conditioned  on  T  ,  is  lin¬ 
ear  Gaussian,  Equation  (5.37)  implies  that  M k  Kalman  filters  are  needed  to  compute 
E[xfc|zi:fc]!  Clearly,  this  is  unacceptable  for  any  real-time  application.  Instead,  we  will 
have  to  consider  suboptimal  solutions. 

There  are  two  data  association  techniques  that  are  widely  used  in  modern  radar  sys¬ 
tems.  The  first,  multiple  hypothesis  tracking  (MHT),  approximates  (5.37)  directly  by 
maintaining  the  filtered  outputs  from  the  top  N  association  hypotheses  [67].  Of  course, 
with  each  new  measurement  vector  zk,  the  set  of  N  hypotheses  expands  to  (mk  +  l)iV 
possibilities,  which  must  then  be  reduced  back  to  a  manageable  level.  The  MHT  al¬ 
gorithm  accomplishes  this  by  combining  the  filtered  outputs  of  hypotheses  that  are 
sufficiently  similar  and  discarding  those  associations  whose  conditional  probabilities 
become  too  small.  The  second  popular  approach,  the  probabilistic  data  association  fil¬ 
ter  (PDAF),  “re-Gaussianizes”  its  estimate  of  the  posterior  density  after  each  scan  [68]. 
We  present  its  key  equations  next. 

The  PDA  filter  can  be  motivated  as  follows.  Assume  that  the  posterior  given  Zi-^-i 
is  actually  Gaussian, 


Pi^-k  —  1  —  1 )  1 3  ^-k— l|fe— 1 5  ^-‘k— 1  |fc— 1 ) 5  (5.38) 

for  some  choice  of  ik_1\k_1  and  Efc_1|fc_1.  Note  that,  unlike  (5.36),  there  is  no  con¬ 
ditioning  on  association  events  in  (5.38).  Next,  express  the  posterior  given  Zi-k  as 

p(x*|z1:*)  =  5^p(x*|r^,z1:*)  Pr(r^|zi:*).  (5.39) 

3= 0 

Then,  if  the  clutter-free  system  was  linear  Gaussian,  we  would  have 

p(tik\T[J\z1:k)  =A((xfc;xJ>,EW),  (5.40) 

where  xj/jj!,  and  Ej/jj!,  denote  the  output  of  a  single  Kalman  iteration  using  zj p  as  mea¬ 
surement  and  Af(xk_i\k_i ,  _ !|* _ j_)  as  prior.4  Plugging  (5.40)  into  (5.39),  the  prob¬ 

abilistic  data  association  filter  makes  the  approximation 

mu 

p(x*|z1:*)  =  Pr(riJ)lzi:fc)  (5-41) 

3= 0 

r&J\f(^Xk,Xk  (5.42) 

where  x^  and  T,k\k  are  the  mean  and  covariance  of  the  Gaussian  mixture  distribution 
in  (5.41).  In  words,  at  each  scan,  the  PDA  algorithm  approximates  the  previous  pos¬ 
terior  p(xk- 1  jzi:fc_i)  as  Gaussian  so  that  a  Kalman  filter  (or  an  EKF)  can  be  used  to 
compute  p(xj;  jr^,  Zi:fc)  for  j  =  0,1  ,...,mk.  The  resulting  mixture  distribution  is 

4Note  that  the  superscript  notation  does  not  refer  to  IMM  models  in  this  section. 
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then  “re-GaussianizecT  for  the  next  scan. 

Because  the  individual  densities  p  (x^  |rj^,  Zi:k)  are  never  actually  required,  it  suf¬ 
fices  to  compute  and  E* k  directly.  (Note,  these  will  not,  in  general,  be  equal  to  the 
true  conditional  mean  and  covariance.)  The  equation  for  the  PDAF  mean  is  (see  [69] 
for  details) 

mk 

=  ik\k-i  +Kkyk  where  yfc  =  ^y£j)  Pr^^lzi^)  (5.43) 

3= 1 

and  Pr(r^  Z\-k)  is  given  by  (5.32).  The  equation  for  the  PDAF  covariance  is 

mh 

^  =  sjjl  +  Kkfey^yf  Pr(rf  W)  -  y*y*)^*>  (5-44> 

3= 1 

where  is  the  covariance  if  =  1, 

s£|i  =  (1  -  PDPG)tk\k-i  +  PDPG%k ,  (5.45) 

and  E*|t  is  the  standard  Kalman  gain  matrix.  Inspection  of  Equations  (5.43)  and  (5.44) 
reveal  that  the  PDAF  can  be  implemented  using  a  single  Kalman  filter  (or  EKF),  with 
(5.43)-(5.45)  replacing  the  standard  Kalman  update.  As  such,  the  PDA  filter  is  com¬ 
putationally  much  less  expensive  than  the  MHT  algorithm. 

Finally,  we  note  that  it  is  straightforward  to  implement  the  PDA  algorithm  within 
the  IMM  framework.  Only  two  things  need  to  be  changed.  First,  each  of  the  IMM  fil¬ 
ters  will  use  (5.43)-(5.45)  instead  of  the  standard  Kalman  (or  EKF)  update  equations. 
Second,  each  model  likelihood  A^7'  must  now  be  averaged  over  the  possible  data  as¬ 
sociations.  This  average  takes  the  simple  form  A^7-*  =  J2j=o  Pr(r£*’  r^lzi:*)>  where 
pO'.7)  tjje  even);  that  the  jth  measurement  is  taken  as  correct  for  the  7th  IMM  filter. 
More  details  can  be  found  in  [64]. 

5.3.3  JPDA  filter 

In  this  section,  we  extend  our  discussion  to  the  case  of  multitarget  tracking  in  the 
presence  of  clutter  and  missed  detections.  Although  the  PDA  filter  from  the  previous 
section  can  still  be  used  in  this  case,  it  tends  to  give  poor  results  because  its  clutter 
model  is  no  longer  valid.  Recall,  the  PDA  algorithm  assumes  that,  at  most,  one  mea¬ 
surement  from  z k  corresponds  to  an  actual  target.  Because  this  true  measurement  is 
detected  with  probability  PdPg  >  0-7  (typically),  it  can  be  considered  as  relatively 
“persistent"  from  scan  to  scan.  The  false  alarms,  on  the  other  hand,  are  assumed  to 
be  independent  from  scan  to  scan,  with  a  uniform  distribution  across  the  gate  volume. 
As  such,  false  alarms  are  unlikely  to  form  persistent  sequences  across  multiple  scans 
(unless  the  clutter  density  /?  is  high).  When  the  gates  of  two  or  more  targets  overlap, 
there  will  then  be  multiple  persistent  sources,  and  this  degrades  the  performance  of  the 
PDA  filter.  Joint  probabilistic  data  association  (JPDA)  overcomes  this  limitation  by 
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Figure  5.2:  Illustration  of  measurement  gates  and  the  corresponding  validation  matrix. 


computing  the  association  probabilities  jointly  for  all  target  tracks  whose  gates  over¬ 
lap  [70], 

The  JPDA  algorithm  relies  upon  the  notion  of  feasible  associations.  A  feasible 
association  of  data  to  targets  is  one  that  satisfies  two  criteria:  (1)  no  measurement  can 
have  more  than  one  target  as  its  source,  and  (2)  in  any  given  scan,  no  target  can  produce 
more  than  one  measurement.  The  actual  algorithm  has  four  basic  steps:  formation  of 
the  validation  matrix,  enumeration  of  all  feasible  associations,  calculation  of  associa¬ 
tion  probabilities,  and  update  of  the  posterior  densities. 

The  first  step,  formation  of  the  validation  matrix,  reduces  computation  in  the  sub¬ 
sequent  steps.  In  Figure  5.2,  a  simple  example  involving  two  targets  is  shown.  For 
notational  simplicity,  the  scan  subscript  k  is  suppressed.  We  use  z1  and  z2  to  denote 
the  predicted  measurements  for  targets  t\  and  i2.5  {z^, . . . ,  z^}  is  the  set  of  delay- 
Doppler  measurements  received.  The  extent  of  a  target’s  validation  gate  is  chosen  so 
that  any  measurement  falling  outside  is  very  unlikely  to  have  been  produced  by  the 
target.  This  gating  eliminates  many  feasible  associations  that  would  have  negligible 
probabilities.  The  gates  in  Figure  5.2  are  represented  by  the  two  ellipses.  The  resulting 
validation  matrix  is  shown  at  the  right.  Each  row  of  the  matrix  corresponds  to  a  dif¬ 
ferent  measurement  while  each  column  corresponds  to  a  different  target.  The  column 
to  is  included  to  represent  the  possibility  that  a  measurement  is  a  false  alarm.  Thus, 
fi  (  j.  t )  =  1  means  that  the  jth  measurement  may  have  originated  from  target  t .  The 
need  for  a  joint  approach  to  data  association  is  highlighted  whenever  a  measurement 
lies  within  multiple  gates.  In  Figure  5.2,  we  see  that  z ^  lies  within  both  gates.  Data 
assignment  for  the  two  targets  must  proceed  jointly  because  association  of  z with  t\ 
precludes  its  association  with  t-2,  and  vice  versa. 

The  second  step  in  the  JPDA  algorithm  requires  the  enumeration  of  all  feasible  as¬ 
sociations.  Conceptually,  a  feasible  association  can  be  found  by  changing  ones  to  zeros 
within  fl  until  each  row  contains  exactly  one  nonzero  entry  and  each  column  contains 
at  most  one  nonzero  entry  (except  for  column  to ,  which  may  contain  any  number  of 
ones).  If  the  clutter  density  is  high,  it  may  become  impractical  to  enumerate  all  feasible 
associations.  In  cases  such  as  these,  a  technique  such  as  Murty’s  algorithm  can  be  used 
to  return  the  top  N  associations  [71],  Murty’s  algorithm  offers  the  significant  benefit 
that  the  search  proceeds  in  order;  namely,  the  best  assignment  is  determined  first,  fol- 

5  In  this  section,  the  variable  t  is  used  for  the  target  index  instead  of  the  transmitter  index. 
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lowed  by  the  second  best,  third  best,  and  so  on.  Thus,  it  is  possible  to  terminate  the 
search  after  the  first  N  feasible  associations  have  been  found. 

The  third  step  in  the  JPDA  algorithm  involves  computation  of  Pr(rj?  |zi;* )  for 
each  feasible  association.  In  this  case,  will  associate  measurements  from  z  /.  with 
multiple  targets.  An  expression  for  these  multitarget  association  probabilities  can  be 
derived  in  the  same  way  as  the  single-target  probabilities  from  Section  5.3.1.  Therefore, 
we  will  simply  state  the  result.  Further  details  can  be  found  in  [70].  Defining  NlD 
and  Np  as  the  number  of  detections  and  false  alarms,  respectively,  in  the  association 
provided  by  T® ,  it  can  be  shown  that 


Pr(lf  |z1:*) 


(Pd)n>»  ((1  -  PdPg)P ) 
C 


N‘ 


n^(: 

ter<!) 


V«*>. 

y  k,t  ’ 


0,  Sk  i 


(5.46) 


where  C  is  a  normalization  constant  and  the  double  subscripts  refer  to  the  scan  ( k ) 
and  the  target  index  ( t ).  The  product  in  (5.46)  is  over  all  targets  that  association 
specifies  as  having  been  detected.  The  innovation  y^  is  formed  using  \  the 
measurement  that  associates  with  target  t.  Note  the  similarity  between  (5.46)  and 
the  single-target  results  in  (5.32).  Basically,  for  each  target  that  specifies  as  having 
been  detected,  there  is  a  factor  of  Pd  A/"( yj^ ;  0,  Sk,t )  in  (5.46).  For  each  false  alarm 
in  r[°,  there  is  a  factor  of  (1  —  Pd-Pg)/?- 

The  fourth  and  final  step  of  the  JPDA  algorithm  involves  the  modification  of  the 
Kalman  update  equations.  The  JPDA  update  equations  are  identical  to  those  of  the 
PDAF,  (5.43)-(5.45).  The  only  difference  is  that  the  joint  association  probabilities 
from  the  previous  step  are  used  instead  of  the  single-target  probabilities.  It  is  these  joint 
probabilities  that  allow  the  JPDA  algorithm  to  handle  targets  with  overlapping  gates. 
Because  our  benchmark  filter  must  be  able  to  track  multiple  targets  in  the  presence  of 
clutter  and  missed  detections,  we  will  implement  the  JPDA  algorithm  to  perform  data 
association  in  our  IMM-EKF  tracker. 
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CHAPTER  6 


PARTICLE  FILTER-BASED 
TRACKING  AND 
CLASSIFICATION 


In  Chapter  4,  we  presented  the  sequential  importance  sampling  (SIS)  algorithm  upon 
which  particle  filters  are  based.  We  detailed  two  specific  variants:  a  generic  particle 
filter  that  used  the  prior  as  its  proposal  distribution  (see  Section  4.3.1)  and  an  auxil¬ 
iary  particle  filter  whose  characterization  variable  was  drawn  from  the  prior  (see 
Section  4.3.4).  In  Chapter  7,  we  will  implement  the  flight  model  from  this  chapter 
using  both  of  these  algorithms.  Because  sampling  from  the  prior  p(x^  |x*_i )  is  equiv¬ 
alent  to  driving  the  state  model  /^(xfc-i,  u*)  with  a  sample  from  the  process  noise, 
it  suffices  to  specify  our  state  and  measurement  models.  We  follow  the  organization 
of  the  previous  chapter  and  divide  our  discussion  into  sections  on  the  state  model,  the 
measurement  model,  and  data  association. 


6.1  State  Model  for  the  Particle  Filter 

The  sequential  importance  sampling  framework  provides  an  impressive  amount  of  flex¬ 
ibility  to  the  system  designer.  It  is  a  straightforward  matter  for  particle  filters  to  model 
systems  that  are  neither  linear  nor  Gaussian.  As  such,  we  are  free  to  use  state  models 
of  substantial  complexity.  With  this  freedom,  it  makes  sense  to  adopt  a  model  that  is 
as  closely  matched  to  the  actual  state  dynamics  as  possible.  Because  we  are  primar¬ 
ily  interested  in  tracking  aircraft,  this  means  finding  an  aerodynamically  valid  flight 
model.  This  is  an  important  point.  Unlike  the  extended  Kalman  filter,  which  adopts  a 
Gaussian  model  because  that  is  all  the  algorithm  will  allow,  a  particle  filter  is  free  to 
use  the  correct  state  model,  whenever  it  is  known. 

For  our  purposes,  an  airplane  can  be  modeled  as  a  rigid  body  with  six  degrees  of 
freedom  [72].  Three  degrees  are  needed  to  specify  its  location  in  space  (or  rather,  the 
location  of  a  reference  point,  such  as  its  center  of  mass).  The  other  three  degrees  are 
needed  to  specify  the  angular  orientation  of  the  aircraft  with  respect  to  a  fixed  reference 
frame.  Its  motion  is  then  completely  specified  by  its  time-varying  location  and  orien- 
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Figure  6.1:  (a)  Example  of  a  receiver-centered.  Earth-fixed  (inertial)  frame,  (b)  Exam¬ 
ple  of  a  body-centered  frame,  (c)  Definition  of  the  angle  of  attack  a  and  the  sideslip 
angle  8. 


tation.  Although  we  refrain  from  specifying  the  components  of  the  state  vector  just 
yet,  it  is  clear  that  we  need  a  state  model  that  accurately  predicts  both  the  translational 
and  rotational  motion  of  an  airplane.  The  state  vector  is  then  implicitly  defined  as  the 
set  of  variables  needed  to  accomplish  this  task.  Of  course,  because  translational  and 
rotational  motions  are  caused  by  forces  and  torques,  respectively,  our  modeling  efforts 
must  focus  on  these.  Before  we  can  do  so,  though,  we  first  introduce  the  reference 
frames  and  aerodynamic  angles  that  we  will  need  in  our  discussion. 

6.1.1  Reference  frames  and  aerodynamic  angles 

One  of  the  difficulties  with  rigid-body  models  is  the  need  to  work  in  different  frames. 
In  every  dynamics  problem,  there  must  be  an  inertial  reference  frame  [73].  This  frame 
is  fixed  relative  to  the  stars.  In  an  inertial  frame,  the  net  acceleration  of  an  object  can 
be  found  using  Newton’s  second  law,  f  =  ma,  where  to  is  the  mass  of  the  object  and  f 
is  the  sum  of  all  external  forces  acting  upon  it.  Note  that  Newton’s  second  law  does  not 
hold  if  the  reference  frame  itself  is  rotating  or  accelerating.  In  most  problems  involving 
airplane  dynamics,  the  rotation  of  the  Earth  (relative  to  an  inertial  frame)  is  negligible, 
and  an  Earth-fixed  frame  can  be  used  in  place  of  an  inertial  frame. 

An  Earth-fixed  frame  is  simply  a  reference  frame  whose  origin  is  fixed  either  inside 
or  on  the  surface  of  the  Earth.  A  natural  choice  for  the  origin  in  an  Earth-fixed  frame  is 
the  location  of  the  radar  receiver.  Following  common  practice,  we  will  use  a  receiver- 
centered  Earth-fixed  frame  as  our  inertial  frame.  The  unit  vectors  in  this  frame  will 
be  denoted  as  ix,  iy,  and  iz.  It  is  then  standard  practice  to  define  the  Earth-fixed  axes 
such  that  ix  points  north,  iy  points  east,  and  iz  points  down  (towards  the  center  of  the 
Earth).  An  example  of  a  receiver-centered  Earth-fixed  frame  is  shown  in  Figure  6.1(a). 

A  second  type  of  reference  frame  that  we  will  need  is  a  body-centered  (or  body- 
fixed)  frame.  As  the  name  implies,  the  origin  of  this  frame  is  fixed  to  some  point  within 
the  aircraft,  typically  its  center  of  mass.  The  unit  vectors  in  this  frame  will  be  denoted 
as  bx,  by,  and  bz.  It  is  common  practice  to  orient  the  body  axes  such  that  bx  points  out 
the  nose  of  the  plane,  by  points  to  the  pilot’s  right,  and  bz  points  through  the  plane’s 
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bottom.  Furthermore,  the  rotational  movement  of  the  aircraft  is  usually  specified  in  the 
body  frame.  Rotation  about  the  bx,  bv,  and  bz  axes  is  typically  referred  to  as  roll ,  pitch, 
and  yaw,  respectively.  An  example  of  a  body-centered  frame  is  shown  in  Figure  6.1(b). 

Finally,  there  is  one  more  frame  that  we  need  to  consider.  This  frame  is  referred 
to  as  the  wind  frame.  Its  origin  is  at  the  center  of  mass  of  the  vehicle,  but  its  x-axis  is 
directed  along  the  velocity  vector  v,  which  generally  does  not  coincide  with  bx.  The 
2-axis  of  the  wind  frame  is  typically  defined  to  lie  in  the  bxbz- plane.  Because  the  wind 
frame  is  not  a  convenient  reference  frame  in  which  to  work,  we  will  not  define  unit 
vectors  for  it.  Instead,  we  define  two  angles  that  can  be  used  to  align  the  wind  frame 
with  the  body  frame.  These  angles  are  shown  in  Figure  6.1(c).  The  angle  of  attack  is 
defined  as 

a  =  tan-1  (vbz/vx)  ,  (6.1) 

where  vx  denotes  the  x-component  of  the  velocity  vector  expressed  in  the  body  frame. 
The  sideslip  angle  is  defined  as 


/?  =  sin  1  (vby/v)  , 


(6.2) 


where  v  is  the  speed  of  the  aircraft  (i.e.,  v  =  ||v||).  These  angles,  which  are  fundamen¬ 
tally  important  in  determining  the  forces  that  act  upon  an  aircraft,  are  typically  referred 
to  as  the  aerodynamic  angles. 1 

We  have  specified  the  three  references  frames  that  we  will  need  to  derive  our  state 
model.  We  now  discuss  how  vectors,  such  as  velocity,  can  be  transformed  from  one 
frame  to  another.  To  avoid  confusion,  we  will  use  the  superscripts  b  and  w  to  identify 
vectors  that  are  expressed  in  the  body  and  wind  frames,  respectively.  Vectors  that 
are  expressed  in  inertial  coordinates  will  be  written  without  a  superscript.  To  begin, 
the  orientation  of  any  frame  with  respect  to  another  is  given  by  three  angles.  We  are 
primarily  interested  in  the  sequence  of  right-handed  rotations  about  iz,  iy,  and  ix  (in 
that  order)  that  bring  the  inertial  frame  into  alignment  with  the  body-centered  frame. 
This  is  a  particular  case  of  Euler  angles.  The  necessary  rotations  about  iz,  iy,  and 
ix  will  be  referred  to  as  the  heading,  elevation,  and  bank  angles,  respectively,  of  the 
aircraft.  We  follow  common  practice  and  denote  heading  as  ip,  elevation  as  0,  and 
bank  as  <j).  Given  these  three  angles,  we  can  then  define  the  orthogonal  transformation 
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(6.4) 


where  we  used  the  shorthand  c  for  cos  and  s  for  sin  to  make  (6.4)  more  readable.  In 

1  Because  both  conventions  are  widespread,  we  will  use  /3  to  represent  the  clutter  density  and  the  sideslip 
angle.  In  all  cases,  the  intended  meaning  should  be  clear  from  the  context. 
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each  case,  the  subscript  accompanying  c  or  s  denotes  the  argument  of  the  corresponding 
trigonometric  function.  The  matrix  Obl  transforms  vectors  from  the  inertial  frame  into 
the  body  frame.  Thus,  the  velocity  vector  in  body-centered  coordinates  is  \b  =  Obtv. 
Furthermore,  because  Obl  is  orthogonal,  we  have  (O6*)-1  =  Obl  .  This  means  that 
Obl  can  also  be  used  to  transform  body-centered  vectors  into  inertial  coordinates  via 
x  =  Obi'xb. 

In  addition  to  Obl,  it  is  also  helpful  to  define  a  transformation  that  maps  wind 
coordinates  into  body-centered  coordinates.  From  Figure  6.1(c),  we  see  that  {—ft,  aO) 
is  the  sequence  of  rotations  that  bring  the  wind  frame  into  alignment  with  the  body 
frame.  Substituting  {—ft,  01,0)  for  {ip,  0,  <f>)  in  (6.4)  yields 
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(6.5) 


A  vector  in  the  wind  frame  can  then  be  expressed  in  the  body  frame  via  x6  =  Obwxw . 


6.1.2  Modeling  the  translational  motion  of  an  airplane 

In  the  previous  section,  we  developed  the  geometric  framework  needed  to  express  the 
forces  and  torques  that  act  upon  an  aircraft  during  flight.  In  this  section,  we  provide 
an  aerodynamic  ally  accurate  description  of  the  relevant  forces.  The  issue  of  torque  is 
taken  up  in  the  next  section. 

Broadly  speaking,  there  are  three  types  of  forces  that  act  upon  an  airplane: 

1 .  weight  due  to  gravity, 

2.  the  propulsive  force  (or  thrust)  due  to  the  engines, 

3.  the  aerodynamic  force  due  to  forward  motion  through  the  atmosphere. 

The  magnitude  of  the  weight  vector  is  mg ,  where  m,  is  the  mass  of  the  target  and  g  is 
the  acceleration  due  to  gravity.  The  direction  of  the  weight  vector  is  most  naturally  ex¬ 
pressed  in  inertial  coordinates  where  it  points  downward,  w  =  mg  iz .  The  magnitude 
of  the  force  due  to  thrust  is  controlled  by  the  pilot.  For  a  given  aircraft,  it  will  have  both 
upper  and  lower  bounds.  The  upper  bound  is  determined  by  the  maximum  output  of 
the  engines.  The  lower  bound  is  dictated  by  the  minimum  velocity  at  which  adequate 
lift  is  generated  by  the  wings  to  remain  airborne.  We  denote  the  magnitude  of  the  thrust 
vector  as  T  and  assume  that  it  is  directed  along  bx ,  the  body-centered  x-axis.  The  final 
item  from  the  list,  the  aerodynamic  force,  does  not  lie  along  any  of  the  reference  axes 
that  we  have  defined  so  far.  We  must  consider  each  of  its  components  individually. 

The  components  of  the  aerodynamic  force  are,  in  general,  proportional  to  pv2S, 
where  p  is  the  density  of  air  at  the  aircraft’s  altitude,  v  is  its  speed,  and  S  is  a  reference 
area,  usually  that  of  the  wings.  The  aerodynamic  force  is  most  conveniently  defined  in 
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Figure  6.2:  Illustration  of  forces  relevant  to  flight.  The  cross-wind  component  is  not 
shown.  (It  would  be  directed  perpendicular  to  the  page.) 
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where  D ,  C,  and  L  are  the  drag,  cross-wind,  and  lift  components,  respectively,  and  Cl , 
Cc ,  and  Cl  are  their  nondimensional  equivalents.  Because  v  points  along  the  x-axis  of 
the  wind  frame,  we  see  that  drag  and  lift  are  defined  as  expected.  Namely,  D  is  defined 
in  opposition  to  the  forward  motion  of  the  aircraft,  while  L  is  defined  (approximately) 
perpendicular  to  the  wings.  Figure  6.2  illustrates  the  angular  relationship  between  all 
of  the  relevant  forces  in  our  discussion.  Of  course,  the  definitions  of  these  components, 
in  and  of  themselves,  are  nothing  significant.  We  are  free  to  express  the  aerodynamic 
force  vector  in  any  frame  of  reference  that  we  like.  The  advantage  of  this  specific 
parameterization,  though,  is  the  simple  forms  that  the  coefficients  Cl,  Cc,  and  Cl 
assume.  For  subsonic  and  supersonic  flight,  the  following  approximations  hold  over  a 
useful  range  of  a  [8,73]: 


CL  ~  CLcxa,  (6.7) 

Cl  fa  C Lmin  +  KlC\,  (6.8) 

where  the  three  terms  Cl0,  Cj>ui.n,  and  Kd  are  principally  functions  of  the  aircraft’s 
shape  and  speed.  Because  we  are  interested  in  maneuvers  with  limited  variation  in 
speed,  we  will  ignore  this  aspect  of  their  dependence.  However,  we  will  not  ignore  the 
dependence  of  Cl0,  Cl min,  and  I\  d  on  the  aircraft  itself.  In  fact,  variation  of  these 
coefficients  between  target  classes  is  precisely  why  classification  can  improve  tracking 
performance.  If  a  target  can  be  identified,  the  correct  coefficients  can  be  used  within 
the  motion  model.  As  for  the  cross-wind  component  from  (6.6),  we  take  Cc  ~  0, 
which  is  a  valid  assumption  as  long  as  the  sideslip  angle  /?  is  relatively  small. 

We  are  now  in  a  position  to  express  the  net  force  exerted  upon  an  aircraft  due  to 
gravity,  thrust,  lift,  and  drag.  It  is  most  useful  to  do  this  in  inertial  coordinates.  Using 
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the  transformations  Obl  and  Obw ,  we  have 


f  =  Obi'  (Tbx+Obw a™)  +w 
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(6.9) 


where  Cl  and  Cl,  are  provided  by  (6.7)  and  (6.8).  With  this  equation,  we  now  have 
an  excellent  model  for  the  net  force  exerted  on  an  aircraft  during  flight.  It  is  important 
to  note  that  (6.9)  is  substantially  more  than  just  a  rigid-body  motion  model;  because 
of  Cl  and  C'n ,  it  is  designed  explicitly  for  flight.  Thus,  to  calculate  f,  two  sets  of 
parameters  are  needed.  First  are  the  class  attributes.  This  set  is  defined  as 


Ac  =  {CLa,CDmin,KD,  S,mj, 


(6.10) 


where  the  subscript  c  identifies  the  class  represented  by  the  parameters.  This  set  is  com¬ 
pletely  determined,  given  the  identity  of  the  target.  For  our  simulations  in  Chapter  7, 
we  will  assume  that  all  five  are  time-invariant.2  Second  are  the  kinematic  attributes 
{Pz,v,ip,6,<fr  ,T}.  Note  that  pz  is  only  required  because  p,  the  density  of  air,  is  a 
function  of  altitude.3  Also,  given  the  Euler  angles  (ip,  6,  (p),  the  aerodynamic  angles 
(a,  pi)  can  be  derived  from  v  using  (6.1)  and  (6.2). 

At  this  point,  it  is  now  possible  to  summarize  our  entire  argument  for  the  necessity 
of  a  joint  approach  to  tracking  and  classification  in  an  RCS-based  system.  First,  we 
consider  the  classifier.  Within  the  VHF  band,  we  have  seen  that  the  variation  in  RCS 
of  a  moving  airplane  can  be  a  powerful  means  of  target  discrimination.  In  Section  2.2, 
we  showed  that  RCS  is  a  function  of  the  incident  and  scattered  wavevectors,  k,  and  ks 
(see  Equation  (2.1)).  To  determine  the  directions  of  these  vectors,  the  classifier  needs 
an  estimate  of  the  target’s  position  and  angular  orientation  —  precisely  the  output  of 
the  tracker.  Second,  consider  the  tracking  system.  To  estimate  the  flight  path  of  an 
aircraft,  it  is  necessary  to  estimate  the  net  force  exerted  on  the  airframe.  As  shown  in 
(6.9),  an  accurate  model  for  this  force  requires  knowledge  of  the  parameter  set  Ac  ■  As 
such,  the  tracker  needs  an  estimate  of  the  correct  class  c  —  precisely  the  ouput  of  the 
classifier. 

-For  example,  we  will  ignore  the  fact  that  the  mass  m  of  the  target  decreases  slightly  with  time  as  fuel  is 
consumed. 

3The  density  of  air  in  kg/m3  is  given  by  p  =  Pressure/ (287.05  •  Temperature),  where  the  pressure  is 
required  in  Pa  and  the  temperature  in  degrees  Kelvin.  In  the  troposphere  (sea  level  to  1 1  km),  the  following 
approximations  are  valid  for  dry  air  and pz  in  meters:  Pressure  PS  101325(1  —  .0065pz /288.15)5'256  and 
Temperature  PS  288.15  —  .0065pz. 
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6.1.3  Modeling  the  rotational  motion  of  an  airplane 

In  the  previous  section,  we  developed  an  aerodynamically  valid  model  for  the  force 
exerted  on  an  aircraft  during  flight.  In  this  section,  we  discuss  the  issue  of  torque.  At 
the  outset,  it  is  very  important  that  we  recognize  a  key  difference  between  the  forces 
and  torques  that  affect  flight.  For  the  most  part,  the  forces  on  the  aircraft  have  external 
origins.  For  example,  w  results  from  the  action  of  gravity  upon  the  aircraft.  Likewise, 
a”  results  from  the  action  of  the  atmosphere  upon  the  plane.  The  only  force  that  is 
generated  by  the  plane  itself  is  the  propulsive  force  Tbx,  which  is  controlled  by  the 
pilot  via  the  throttle.4  To  a  good  approximation,  this  is  not  the  case  with  the  torque  ex¬ 
erted  on  an  aircraft.  In  fact,  airplanes  are  designed  with  the  explicit  intent  of  resisting 
rotations  caused  by  external  influences.  Stability  against  undesired  pitching  and  yaw¬ 
ing  is  achieved  by  the  horizontal  and  vertical  surfaces  of  the  tail,  respectively.  Stability 
against  undesired  rolls  can  be  achieved  by  giving  the  wings  of  an  aircraft  a  slight  up¬ 
ward  tilt.  In  a  stable  aircraft,  the  pilot  can  actually  release  the  controls,  and  the  vehicle 
will  continue  in  straight  and  level  flight  [74], 

Instead  of  having  external  causes,  the  torques  that  produce  rolling,  pitching,  or 
yawing  moments  are  under  the  direct  control  of  the  pilot.  Rolling  motion  (rotation 
about  bx)  is  controlled  by  the  ailerons,  the  flaps  at  the  ends  of  the  wings.  Pitching 
motion  (rotation  about  bv)  is  controlled  by  the  elevators,  typically  extensions  on  the 
horizontal  stabilizer  of  the  tail.  Finally,  yawing  motion  (rotation  about  bz)  is  controlled 
by  the  rudder.  Therein  lies  the  difference  with  modeling  torque.  Because  its  origin  lies 
with  the  aircraft  itself,  modeling  torque  is  really  an  exercise  in  modeling  pilot  intent. 
This  is  a  distinct  advantage  because  it  means  that  there  should  always  be  a  certain 
“intelligence”  to  the  rotational  motion  of  an  airplane.  The  influence  of  pilot  intent 
upon  rotational  motion  can  be  quantified  by  defining  a  set  of  rotation  modes.  Each 
rotation  mode  corresponds  to  a  different  type  of  flight,  such  as  a  turn  or  a  climb.  With 
these  modes,  the  rotational  motion  of  an  aircraft  is  then  nothing  more  than  the  result  of 
an  unknown  sequence  of  maneuvers  {71 ,  72 , ■ ■  ■ ,  7* }  commanded  by  the  pilot,  where 
7fc  is  the  mode  that  the  target  operates  in  from  kAt  until  ( k  +  1)  At. 

Maneuvers  can  be  classified  as  either  longitudinal  or  lateral.  A  longitudinal  ma¬ 
neuver  is  any  that  occurs  with  zero  bank  angle  (<f>  =  0).  Longitudinal  maneuvers  are 
initiated  by  pitching  moments.  For  example,  to  initiate  a  climb,  the  pilot  will  pitch  the 
aircraft  up  to  increase  the  angle  of  attack  a  and  generate  more  lift  (recall  that  L  oc  a). 
Maneuvers  that  involve  nonzero  bank  angles  are  termed  lateral.  They  are  initiated 
through  yawing  and  rolling  moments.  The  useful  assumption  of  coordinated  flight  is 
often  made  when  dealing  with  lateral  maneuvers.  Coordinated  flight  is  defined  as  flight 
in  which  the  control  surfaces  (ailerons,  elevators,  and  rudder)  are  manipulated  such  that 
the  following  two  goals  are  attained.  First,  the  velocity  vector  is  kept  in  the  bxbz- plane 
(i.e.,  8  ?s  0),  so  that  air  flows  past  the  cabin  and  not  into  it.  This  minimizes  drag  and 
maximizes  lift  [74].  Second,  the  net  force  on  objects  within  the  cabin  is  directed  along 

4 Of  course,  this  is  not  to  say  that  the  pilot  does  not  have  some  control  over  the  lift  and  drag  forces.  For 
example,  because  L  is  proportional  to  both  a  and  v2,  the  lift  vector  can  be  changed  by  altering  either  the 
angle  of  attack  or  the  speed  of  the  airplane. 


71 


Table  6.1:  Euler  Angles  as  a  Function  of  Rotation  Mode  and  Pilot  Input 


Mode 

Pilot  Input 

Bank  Angle 

<t> 

Elevation  Angle 

6 

Heading  Change 
dtp/dt 

CV 

(none) 

0 

Oiss 

0 

PT 

Ad* 

0 

Olss  +  A#* 

0 

CT 

w*,a  e* 

tan  ~L(oj*v/g) 

Oiss  +  Ad* 

LO* 

the  6z-axis.  This  means  that  coffee  in  a  cup  onboard  the  turning  aircraft  will  not  spill. 
It  also  minimizes  the  fatigue  of  being  thrown  from  side  to  side  during  lateral  maneuver¬ 
ing  [8].  For  our  state  model,  we  will  assume  that  the  rotational  motion  of  the  target  can 
be  described  using  two  longitudinal  modes  and  one  (coordinated)  lateral  mode.  The 
three  modes  are 

•  mode  1  (CV):  constant-velocity  fixed-altitude  flight, 

•  mode  2  (PT):  longitudinal  climb  or  descent  (i.e.,  pitch)  with  thrust  increase, 

•  mode  3  (CT):  constant-speed  coordinated  turn. 

By  concatenating  these  three  modes,  this  small  collection  is  capable  of  modeling  a  wide 
range  of  real  trajectories.  Note,  we  have  used  two  labels  (CV  and  CT)  that  already 
identify  models  within  our  IMM-EKF  implementation.  This  is  meant  to  signify  that 
these  particle  filter  rotation  modes  represent  the  same  type  of  flight  trajectory  as  the 
corresponding  IMM  filters.  However,  we  must  emphasize  that  the  models  themselves 
are  not  the  same. 

At  this  point,  we  could  proceed  by  formulating  an  expression  for  the  torque  re¬ 
quired  to  produce  the  rotational  motion  associated  with  each  mode.  However,  this  is 
unnecessary.  Because  our  state  model  needs  a  description  of  the  time  evolution  of  the 
Euler  angles  (for  Ohr  in  (6.9)  and  RCS  table  look-up),  the  only  use  we  would  have  for 
the  net  torque  is  to  compute  {tp,d,(j>).  As  such,  it  makes  more  sense  to  model  the  Euler 
angles  directly  for  each  rotation  mode  and  bypass  the  issue  of  torque  altogether. 

Table  6.1  summarizes  our  model  for  the  Euler  angles  during  each  rotation  mode. 
Before  considering  each  row,  a  few  general  comments  are  in  order.  First,  we  note  that 
the  constraint  on  heading  angle  is  defined  in  terms  of  its  derivative ,  not  its  value.  CV 
flight,  for  example,  requires  that  the  heading  angle  be  maintained  at  its  initial  value, 
whatever  that  value  may  be.  Second,  starred  variables  represent  deterministic,  but  un¬ 
known,  pilot  inputs  that  drive  the  rotational  motion  of  the  system.  At  the  beginning 
of  a  maneuver,  these  variables  are  supplied  to  the  aircraft  via  the  control  surfaces.  We 
model  the  response  of  the  bank  and  elevation  angles  (cf>  and  0)  to  these  inputs  as  an  un¬ 
derdamped  second-order  system.  Third,  ass  is  the  steady-state  angle  of  attack  needed 
to  maintain  level  flight  at  some  nominal  speed.  Its  inclusion  in  the  table  highlights  that, 
because  L  oc  a,  a  positive  angle  of  attack  must  be  maintained  at  all  times.  For  level 
flight,  the  velocity  vector  is  parallel  to  the  inertial  ixiy -plane.  In  this  case,  the  elevation 
angle  is  equal  to  the  angle  of  attack.  We  now  proceed  to  justify  the  entries  for  each 
mode  in  Table  6.1. 
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Figure  6.3:  Force  diagram  during  a  constant-speed  coordinated  turn. 


The  CV  model  is  a  constant-velocity  fixed-altitude  mode.  Because  it  models  longi¬ 
tudinal  flight,  the  bank  angle  must  be  zero,  and  there  can  be  no  change  in  heading.  As 
discussed  above,  the  elevation  angle  is  equal  to  the  steady-state  angle  of  attack  needed 
to  generate  enough  lift  to  maintain  level  flight  at  the  desired  (constant)  speed.  The 
PT  mode  is  intended  to  model  longitudinal  climbs  and  descents.  In  this  mode,  the  pi¬ 
lot  pitches  the  aircraft  up  or  down  by  A#*,  which  changes  the  elevation  angle  by  the 
same  amount.  The  resulting  change  in  the  angle  of  attack  causes  the  target  to  climb  if 
A6*  >  0  and  descend  if  A 9*  <  0.  In  this  mode,  the  pilot  is  also  allowed  in  increase  the 
thrust  force  in  order  to  increase  the  aircraft’s  speed.  (This  is  why  there  is  a  ‘T’  in  the  PT 
designation.)  Finally,  the  CT  mode  is  intended  to  model  constant-speed  fixed-altitude 
turns.  As  a  lateral  maneuver,  the  aircraft  can  roll,  pitch,  and  yaw.  However,  because 
we  have  adopted  a  coordinated  flight  model  for  this  mode,  the  resulting  trajectory  is 
primarily  a  function  of  oj* ,  the  desired  turn-rate.  The  relation  for  the  bank  angle  can  be 
derived  by  considering  Figure  6.3.  For  a  constant-speed  turn,  the  thrust  and  drag  forces 
will  cancel  each  other  out  (approximately),  and  we  are  left  with  the  forces  due  to  lift 
and  gravity.  A  fixed-altitude  turn  requires  that  the  net  acceleration  be  in  the  ixiy -plane. 
From  Figure  6.3,  this  implies 

L  cos  (f)  =  Trig  =>■  L  =  m<^  .  (6.11) 

cos  cp 

Because  the  magnitude  of  the  centripetal  acceleration  is  ui*v,  we  also  have 

L  sincj)  =  moo*v.  (6.12) 

Substituting  L  from  (6.11)  into  (6.12)  yields 

gtan(j)=oj*v  =£>  <^>  =  tan-1  ^  ,  (6.13) 

which  is  the  equation  for  the  bank  angle  in  Table  6.1.  Because  a  portion  of  the  lift  force 
is  spent  upon  lateral  movement  in  a  turn,  a  pilot  will  typically  pitch  up  the  aircraft  to 
increase  the  angle  of  attack  and  generate  more  lift.  This  practice  is  allowed  in  our  CT 
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mode  through  the  addition  of  the  input  Ad* .  Similar  to  the  PT  mode.  Ad"  affects  the 
elevation  angle,  but  in  this  case  we  require  Ad*  >  0. 

At  this  point,  it  is  helpful  to  summarize  our  development  thus  far.  We  began  by 
recognizing  that  the  rotational  motion  of  an  airplane  was  generated  almost  entirely  by 
pilot  command.  As  such,  it  is  appropriate  to  model  the  rotational  motion  using  a  set 
of  flight  modes  for  the  most  common  maneuvers.  We  chose  three  modes:  constant- 
velocity,  pitch/thrust,  and  coordinated  turn.  Then,  recognizing  that  calculation  of  f 
only  required  knowledge  of  the  Euler  angles,  we  avoided  the  issue  of  torque  altogether 
and  instead  modeled  (< j>,  d,  tf>)  directly,  as  shown  in  Table  6.1.  Now,  all  that  remains  is 
for  us  to  specify  a  model  for  the  mode  sequence  itself.  For  this,  we  assume  that  {77,. } 
evolves  according  to  a  first-order  Markov  chain  with  transition  matrix  P.  The  elements 
of  P  are  defined  as  P(i,  j)  =  Pr(7fc  =  jY)k-i  =  *)■  This  completes  our  model  for  the 
rotational  motion  of  an  airplane. 

6.1.4  The  complete  flight  model 

In  this  section,  we  combine  our  previous  results  in  order  to  arrive  at  our  complete  flight 
model.  So  far  in  this  chapter,  we  have  largely  neglected  the  sequential  nature  of  the 
problem  and  suppressed  the  scan  index  k  to  simplify  our  equations.  Now,  we  return 
to  the  primary  problem  of  modeling  the  evolving  state  X/.  of  our  system.  To  do  this, 
we  first  specify  the  components  of  x/,. .  Based  upon  our  models  for  translational  and 
rotational  motion,  the  state  vector  at  scan  k  is  defined  as 

Xfc  =  {p  k,vk,‘ipk,Ok,<Pk,C,'yk,u*k,Ad*k,T]:},  (6.14) 

where  p*,  is  the  position  vector,  v*,  is  the  velocity  vector,  ('ipk,@k,(t>k)  are  the  Euler 
angles,  £  is  the  class,  77.  is  the  current  maneuver  mode  (ie.,  77  €  {CV,  PT,  CT}),  and 
(00  k ,  A d*k ,Tk)  are  the  pilot  inputs  for  7*, .  We  note  that  the  class  £  is  not  subscripted 
by  k  because  it  does  not  vary  with  time.  Furthermore,  some  of  (wj*,  A d*k,Tk)  are 
fixed,  depending  on  the  current  mode  (e.g.,  0Jk  =  0  if  7 7.  =  CV;  otherwise  the  ma¬ 
neuver  would  be  a  turn).  Nonetheless,  because  x/,.  is  14-dimensional,  it  would  seem 
that  sampling  the  state  space  is  sure  to  be  computationally  daunting.  To  this  concern, 
we  point  out  that  the  effective  state  space  is  actually  five-dimensional.  To  explain 
what  we  mean  by  “effective,”  assume  that  the  initial  state  xo  is  known.  Then,  accord¬ 
ing  to  our  state  model,  the  sequence  xq:;,.  is  completely  determined  by  the  variables 
{C;  7i:*j  A#*.^, ,  T*.k }.  In  words,  if  we  are  given  the  initial  position,  velocity,  and 

orientation  of  the  target,  knowledge  of  the  class,  mode  sequence,  and  corresponding 
pilot  commands  is  sufficient  to  determine  the  state  at  all  times.  The  other  nine  compo¬ 
nents  are  just  deterministic  functions  of  the  first  five  (and  each  other).  Therefore,  if  the 
initialization  of  the  particle  filter  is  sufficiently  good,  the  majority  of  the  subsequent 
inference  should  focus  upon  { ( ,  77 ,  u)k .  Adk.  Tk  } .  These  five  variables  quantify  the 
two  key  unknowns  in  any  tracking  problem:  target  class  and  pilot  intent  (i.e.,  “What  is 
it?”  and  “Where  is  it?”). 
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Algorithm5  Aerodynamically  Accurate  State  Model 
xfc  =  STATE_MODEL[xfc_i] 

•  Draw  7 k  ~  P(yk  i- :)  • 

•  IF  7*  =  7*-i 

-  Draw  ~  ) . 

-  Draw  A(9*  ~  N{M*k_x ,  a2Ae) . 

-  Draw  T*  ~A/'(T;_1,«7§.). 

•  ELSE 

-  IF  7fc  =  CV 

*  Set  (Wft,  A6l,T£)  =  (0,  ass,Tss) . 

-  ELSEIF  7ft  =  PT 

*  Set  Wft  =  0 . 

*  Draw  A<9*  ~  «)  • 

*  Draw  Tft  ~U^T(C). 

-  ELSEIF  7ft  =  CT 

*  Draw  <jj*k  ~  U£T«). 

*  Draw  A6»ft  ~  (C) . 

*  Set  ^  =  Tss  . 

-  END  IF 

•  END  IF 

•  Xft  =  INTEGRATE[xft_i,7ft,w*,  A0*,T;] 


Figure  6.4:  Pseudo-code  description  of  our  state  model  for  flight. 


Our  complete  state  model  can  be  expressed  mathematically  as 

Xft  =47fc)(xft-i,Uft).  (6.15) 

Unfortunately,  there  is  no  simple  expression  for  the  functions  Instead,  in  Fig¬ 

ure  6.4,  we  provide  a  pseudo-code  description  of  how  Xft  is  obtained  from  Xft_i .  Each 
scan,  the  model  first  determines  if  a  mode  change  has  occurred.  If  not,  a  small  amount 
of  noise  is  added  to  the  previous  pilot  inputs;  if  so,  new  pilot  inputs  are  drawn.  Note,  we 
use  P( 7fc_i , :)  to  denote  the  conditional  distribution  over  the  current  mode  given  7fc_i . 
Steady-state  values  for  the  angle  of  attack  and  thrust  are  denoted  by  ass  and  Tss,  re¬ 
spectively.  These  are  the  values  needed  to  maintain  level  flight  at  the  aircraft’s  nominal 
speed.  Finally,  U^T,  U^T,  and  are  uniform  distributions  from  which  the 

pilot  inputs  are  drawn.5  Uniform  distributions  are  used  to  provide  a  maximally  non¬ 
committal  model  of  pilot  intent.  Furthermore,  using  distributions  with  finite  support 

5In  general,  U and  are  not  the  same.  This  is  necessary  because  AO*  >  0  during  the  CT  mode, 
while  Ad*  can  be  negative  during  the  PT  mode  to  permit  a  longitudinal  descent. 
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Algorithm 

6  State  Integration  Rout 

ine 

Xfc  = 

:  INTEGRATE[xfc_ 

1:7  k, 

uj*k,A6l,T*] 

• 

Store  {p ,v,ip,9,(p} 

{Pfc-l,  Vfc_i: 

,1pk-l,9k- 

-1  ,<Pk-l}  ■ 

• 

FOR 

n=  1  :  (Af/Aj) 

- 

(w,A  9,T)  = 

SECOND_ORDER[w£, 

A  91, Tfi 

- 

Determine 

obi 

using  ip ,  9 

,  and  <p 

in  (6.4). 

- 

Determine 

f  u 

sing  p,  v. 

and  Obl 

in  (6.9). 

- 

Calculate 

v  = 

v  +  (Aj/m)f. 

- 

Calculate 

P  = 

:  p  +  AjV  . 

- 

Calculate 

i[>  = 

:  Ip  +  AjW. 

- 

Calculate 

9  = 

ass  +  A 9  . 

- 

Determine 

<f)  using  oj  and 

v  =  ||  v  || 

in  (6.13). 

- 

Store  {p,\,ip,6,4>}  <r-  {p,v,^,0,  <p} . 

• 

END 

FOR 

• 

Xfc  = 

:  {P  ,V,1p,0,(j 

'.C,7*.w;,a  eiT*} 

Figure  6.5:  Pseudo-code  description  of  the  state  integration  routine. 


to  model  pilot  input  is  justified  because  the  motion  of  any  airplane  will  be  limited  by 
its  own  maneuverability  and  propulsion.  By  allowing  these  distributions  to  depend  on 
class  (,  the  actual  kinematic  limitations  of  each  airplane  can  be  used,  thereby  providing 
another  means  of  class  discrimination. 

The  last  line  in  the  pseudo-code  description  of  Figure  6.4  is  a  call  to  a  numeri¬ 
cal  integration  routine.  This  routine  is  responsible  for  the  actual  computation  needed 
to  advance  the  state  from  time  index  k  —  1  to  k.  Given  the  previous  state,  it  uses 
(uik,  A 9k,Tk),  the  current  pilot  inputs,  to  drive  the  translational  and  rotational  flight 
models  presented  in  Sections  6.1.2  and  6.1.3.  A  pseudo-code  description  of  the  routine 
is  provided  in  Figure  6.5.  A  few  comments  should  be  made  concerning  its  implemen¬ 
tation.  First,  the  required  integrals  are  discretized  using  a  time  step  of  A,:.  Because 
At  is  the  sample  period,  Af/Aj  iterations  are  performed  per  function  call.  In  our  ex¬ 
periments,  we  will  take  A*  =  0.2  seconds.  Next,  tildes  are  used  to  denote  the  dummy 
variables  for  the  integration.  Third,  even  though  the  Euler  update  equations  have  no 
functional  dependence  on  the  current  mode  7*,  it  can  be  shown  that  they  still  imple¬ 
ment  the  desired  model  from  Table  6.1.  Finally,  the  function  SECOND  .ORDER  models 
the  response  of  the  aircraft  to  pilot  commands  [40].  Inspection  of  the  state  model 
in  Figure  6.4  reveals  that  significant  changes  in  the  pilot  commands  only  occur  dur¬ 
ing  mode  transitions  (i.e.,  when  7 &  7^  7fc-i).  For  scans  in  which  7*  =  7&_i,  the 
inputs  (wj*,  A9k,Tk)  remain  nearly  unchanged.6  As  such,  the  pilot  input  sequences 

6The  variances  crj ,  ,  and  rr.j  are  all  small.  They  are  meant  to  model  effects  such  as  air  turbulence. 
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Figure  6.6:  Example  of  airplane  response  to  turn-rate  commands  from  the  pilot.  {w£} 
is  the  command  sequence;  {w/0 }  is  the  sequence  of  actual  turn-rates  achieved  by  the 
aircraft. 


are  (nearly)  staircase  functions.  Because  the  aircraft  cannot  respond  instantaneously 
to  changes  in  these  commands,  SECOND  .ORDER  is  used  to  model  its  response.  An 
example  is  shown  in  Figure  6.6  for  the  turn-rate  input  ui*k. 

This  algorithmic  descriptions  in  Figures  6.4  and  6.5  complete  the  description  of 
.  The  statistics  of  the  process  noise  u/,.  are  implicitly  defined  by  the  distributions 
used  to  alter  the  pilot  inputs  (e.g.,  U^T,  U^T,  and  U^  j.  Because  our  mode 

sequence  {o-j. }  is  Markov,  the  overall  state  model  is  jump  Markov.  In  this  sense,  it  bears 
a  similarity  with  our  IMM-EKF.  However,  while  both  filters  use  multiple  maneuver 
modes,  the  IMM-EKF  state  equations  are  not  based  upon  the  actual  translational  and 
rotational  motion  of  an  airplane.  The  IMM-EKF  cannot  implement  the  state  equations 
from  this  chapter  because  it  has  no  way  of  estimating  an  airplane’s  angular  orientation 
(especially  the  bank  angle).  RCS  provides  the  particle  filter  with  this  capability.  In 
the  next  chapter,  we  will  demonstrate  the  significant  improvement  in  tracking  accuracy 
that  can  be  achieved  with  our  aerodynamically  valid  state  model. 


6.2  Measurement  Model  for  the  Particle  Filter 

In  the  previous  section,  we  presented  the  state  model  for  our  particle  filter.  In  this 
section,  we  specify  our  measurement  model  and  the  data  likelihood  function  required 
to  compute  the  particle  weights.  Similar  to  our  discussion  concerning  the  IMM-EKF 
in  Chapter  5,  we  begin  by  considering  the  measurement  model  for  a  single  target  in  a 
clutter-free  environment  with  zero  probability  of  miss.  The  issue  of  data  association 
for  measurements  of  uncertain  origin  is  addressed  in  the  next  section. 

As  mentioned  previously,  one  of  the  particle  filter’s  key  advantages  for  joint  track¬ 
ing/classification  is  its  ability  to  incorporate  RCS  as  a  data  feature.  Assuming  a  single 
target  in  a  clutter-free  environment,  we  have 


Z>k  —  :  dk  :  ■  (6.16) 

where  r/0,  and  ak  correspond  to  measurements  of  delay,  Doppler  shift,  and  RCS, 
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respectively.7  Because  we  model  the  noise  affecting  each  component  of  (6.16)  as  inde¬ 
pendent,  the  data  likelihood  is  just  the  product  of  the  component  likelihoods, 

p(zfc|xfc)  =  p(rk\xk)  p(dk\xk)  p(ak\xk).  (6.17) 

Because  our  IMM-EKF  uses  delay  and  Doppler,  the  first  two  terms  in  (6.17)  were 
already  defined  in  Section  5.2, 

p(n |xfc)  =N(rk]  h[T\-xk),  ofj,  (6.18) 

p(dk\xk)  =  N (dk ;  h[d\-xk),  crjjj,  (6.19) 

where  h^\  the  mapping  from  state  to  delay,  is  defined  implicitly  by  (5.13)  and  hSd\ 
the  mapping  from  state  to  Doppler  shift,  is  defined  implicitly  by  (5. 14)— (5. 15).  All  that 
remains  is  for  us  to  specify  p(ak  jx/. ),  the  RCS  likelihood  function.  Unfortunately,  an 
additive  Gaussian  noise  model  is  inappropriate  for  RCS.  To  understand  why  this  is  so, 
we  must  first  consider  the  front-end  processing  that  occurs  in  a  radar  receiver. 

The  operation  of  a  typical  radar  system  can  be  thought  of  as  occurring  in  two  stages: 
front-end  processing  and  back-end  processing.  The  front-end  is  responsible  for  extract¬ 
ing  the  desired  data  features  (e.g.,  delay  and  Doppler  shift)  from  the  received  signal. 
The  back-end  is  responsible  for  detection  and  estimation,  given  the  data  from  the  front- 
end.  The  type  of  processing  that  occurs  in  the  front-end  is  therefore  determined  by  the 
desired  feature  set.  Because  most  radars  measure  Doppler  shift  using  a  complex  fast 
Fourier  transform,  we  will  assume  that  the  front-end  of  our  radar  consists  of  an  in- 
phase  channel  and  a  quadrature  channel  (i.e.,  two  matched  filters,  90  degrees  out  of 
phase)  [64].  With  this  configuration,  it  is  actually  possible  for  our  radar  to  extract  the 
complex  scattering  coefficient  from  the  reflected  waveform.  Because  we  will  assume 
that  only  one  polarization  is  measured,  we  suppress  the  vv  or  hh  subscript  and  denote 
the  scattering  coefficient  from  scan  k  as  sk. 

Because  RCS  is  equal  to  Is*  |2  (see  Section  2.3),  it  would  seem  that  we  are  throwing 
away  useful  information  by  not  using  the  scattering  coefficient  itself.  However,  the 
relatively  poor  range  resolution  of  VHF-band  radars  makes  the  phase  of  sk  unreliable. 
This  leaves  us  with  |s*  |,  which  is  equivalent  to  using  RCS.  Because  the  thermal  noise 
in  each  channel  of  our  receiver  can  be  modeled  as  additive  Gaussian,  we  arrive  at  the 
following  model, 

ak  =  ^Re(sfc)  +  (lm(sft)  +  nj.Q^  ,  (6.20) 

where  {n^  }  and  {n^'1 }  are  the  in-phase  and  quadrature  noise  processes,  respectively. 
We  assume  that  both  are  mutually  independent  and  zero-mean  Gaussian  with  variance 
a\h .  In  this  case,  ak  will  have  a  noncentral  chi-squared  distribution  with  two  degrees 

7  Because  both  conventions  are  widespread,  we  use  a  to  represent  RCS  and  standard  deviation.  In  all 
cases,  the  identity  of  the  variable  should  be  clear  from  either  its  subscript  or  its  superscript. 
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of  freedom. 


1 

2(Jlk 


exp 


f  0~fc  +  |Sfc|2\ 

V  ) 


Io 


{  y^k\Sk  |  \ 


V 


) 


(6.21) 


where  Jo  is  the  zero-order  modified  Bessel  function  of  the  first  kind.  As  a  noncentral 
chi-squared  random  variable,  the  first  two  moments  of  a k  are 


E[ak]  =  \sk\2  +  2a2nh,  (6.22) 

var[<jfc]  =  4 a2nk  (|sfc|2  +  er2k)  .  (6.23) 

Thus,  we  see  that  <77.  is  biased,  and  its  variance  grows  as  .  In  Chapter  7,  we  will 
investigate  the  effect  that  the  scattering  coefficient  variance  has  on  the  particle  filter’s 
performance. 

Before  continuing,  a  comment  should  be  made  concerning  (6.20).  Strictly  speak¬ 
ing,  the  receiver  measures  the  magnitude  of  the  in-phase  and  quadrature  components 
of  the  received  waveform;  Equation  (2.2)  must  be  used  to  recover  the  magnitude  of 
the  I/Q  components  of  s k .  Of  course,  (2.2)  requires  knowledge  of  the  target’s  position. 
Because  the  typical  error  in  position  estimate  is  small  when  compared  to  the  distances 
between  the  target  and  the  transmitters  and  receiver,  we  make  the  simplifying  assump¬ 
tion  that  the  I/Q  components  of  .s  k  are  measured  directly. 

This  completes  the  specification  of  the  measurement  model  for  our  particle  filter. 
In  summary,  the  measurement  process  is  of  the  form  zk  =  hk(xk,  w/,. ),  where  the 
influence  of  wk  is  no  longer  additive  because  of  (6.20).  Our  particle  weight  update 
equation  is  then 


W 


(X  W 


(*) 

fc-1 


P{*k\x.{k)p{*k 


^t) 


i  =  1, 


1 


(6.24) 


where 


p{*k\*(k)=N(Tk-  4r)(x[°),  O-A f{dk\  h[d)(x jj*5),  a2dk)p(ak\x{^),  (6.25) 


and  p{ak  is  given  by  (6.21)  with  s k  determined  by  table  look-up.  More  pre¬ 
cisely,  given  pk  and  (il>k,0k,(pk)  from  xjf\  the  incident  and  scattered  directions  for 
the  FM  radio  signal  can  be  determined.  These  angles  are  then  be  used  to  access  the 
RCS  table  compiled  for  the  target  class  corresponding  to  the  label  (  from  x^  ! . 


6.3  Data  Association  for  the  Particle  Filter 

So  far,  in  the  development  of  our  particle  filter,  we  have  restricted  our  attention  to 
tracking  a  single  target  in  a  clutter-free  environment  with  zero  probability  of  miss.  In 
this  section,  we  extend  our  discussion  to  include  filtering  in  the  presence  of  multiple 
targets,  false  alarms,  and  nonzero  probability  of  miss.  In  this  case,  our  data  vector 
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becomes 

**  =  [  (4  4  4)  -  ■  -  (4"’  4~>  4"’)  ]'«  (6.26) 

where  m*  is  the  number  of  returns  in  scan  k.  Similar  to  our  presentation  in  Sec¬ 
tions  5.3. 1-5. 3. 3  for  the  EKF,  the  uncertain  origin  of  each  delay-Doppler-RCS  triplet 
will  now  be  addressed.  We  begin  by  considering  the  case  of  a  single  target  in  the 
presence  of  clutter  and  missed  detections. 

6.3.1  PDA  particle  filter 

In  discussing  data  association  for  our  particle  filter,  we  use  the  same  notation  that  was 
introduced  in  Section  5.3.1.  Therefore,  B  and  PF  will  represent  the  clutter  density 
and  probability  of  detection,  respectively.  Furthermore,  we  use  the  superscript  notation 
to  denote  a  single  delay-Doppler-RCS  triplet,  ( t ^  dp1  crjp ) .  Where  it  does  not 
lead  to  confusion,  we  will  refer  to  z^ as  a  single  “measurement.”  In  this  section,  we 
consider  the  case  of  tracking  a  single  target  when  /3  >  0  and  Pd  <  1.  Because  these 
additional  assumptions  do  not  affect  our  state  model,  we  only  need  to  reformulate  the 
update  equation  for  the  particle  weights.  Our  approach  is  equivalent  to  the  one  in  [65]. 

As  presented  in  Section  5.3.1,  there  are  to*  + 1  association  hypotheses  in  the  single¬ 
target  case.  Either  one  of  the  to*  measurements  is  correct,  or  all  are  false.  As  before, 
the  hypothesis  that  to*  —  1  measurements  are  false  and  z'p  is  correct  will  be  denoted 
as  T(,p  (for  j  =  1, . . . ,  to*).  The  hypothesis  that  all  to*  measurements  are  false  will 
be  denoted  as  I^0-* .  Referring  to  (6.24),  we  see  that  the  likelihood  function  p( z*  |x* )  is 
required  to  update  the  particle  weights.  When  there  is  uncertainty  in  the  origin  of  the 
measurements  in  z*,  we  have 


p(z*|x*)  =  ^^(zfcirj^x*)  Pr(r[.j)|x*),  (6.27) 

3=0 

because  {rj,JI }  fe0  are  mutually  exclusive  and  the  only  feasible  associations.  Recall, 
(i) 

each  T,;  '  contains  three  items:  (1)  the  number  of  false  alarms,  (2)  the  number  of  detec¬ 
tions,  and  (3)  a  measurement  index  for  each  target  hypothesized  to  have  been  detected. 
Because  the  probability  of  detection  is  assumed  to  be  independent  of  the  round-trip 
range,  we  have  Pr(r^|x*)  =  Pr(rj^).  Substituting  the  results  from  (5.27)  and 
(5.29),  we  have 

(6.28) 


(6.29) 

(6.30) 


p(z*|x*)  =p(z*|r*.0),x*)  Pr(r^.0))  +  ^p(z*|r[.J),x*)  Pr(r[J)) 

3= 1 

=  Vk~mk(l  -PDPG)Mmk) 


+ 


V-{mh-l)PDlxF{rnk  -  1) 


mk 


TO* 


^2p{z[j  ixfc) 


e—pvk  pmk-l 


3= 1 
mk 


TO*! 


(1  -  PdPg)/?  +  PD  lx*) 

j=1 
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where  V*  is  the  gate  volume  at  scan  k,  Prj  is  the  probability  that  the  target  measurement 
is  contained  within  that  volume,  and  // p  is  assumed  to  be  Poisson  distributed.  Because 
the  particle  filter  weights  are  normalized,  the  constant  term  in  (6.30)  can  be  neglected, 
and  we  are  left  with  the  following  modified  weight  update  equation. 


w 


(«)  „Tl(*) 


oc  w 


(d  -PDPB)P  +  PD  E”\  t>( z«>  |x«))  p(4> |x« ,  ) 


k- 1 


(6.31) 


where  p( zj^|xj^)  is  defined  in  (6.25).  Equation  (6.31)  is  the  probabilistic  data  as¬ 
sociation  rule  for  updating  the  particle  weights.  We  see  that  the  likelihood  p(zk  |x^ ) 
from  (6.24)  has  been  replaced  by  a  sum  over  the  component  likelihoods.  The  constant 
term  (1  —  PdPg)/3  can  be  thought  of  as  a  threshold  that  must  be  exceeded  in  order  for 
a  particle’s  likelihood  to  stand  out  from  the  rest.  For  example,  if  B  is  very  large,  the 
PDA  update  rule  yields  p{zk  jx^ )  »  (7(1  —  PdPg)B  f°r  all  i.  In  this  case,  those  par¬ 
ticles  with  large  values  of  p(xj^  Ixj^)  will  end  up  with  the  most  significant  weights, 
implying  that  the  prior  is  more  trustworthy  than  z/.  when  the  clutter  density  is  high. 


6.3.2  Joint  data  association  for  particle  filters 

Having  discussed  how  to  alter  the  particle  weight  equation  when  tracking  a  single  target 
in  the  presence  of  clutter  and  missed  detections,  we  now  consider  the  multitarget  sce¬ 
nario.  Recall,  the  extension  from  single  target  data  association  to  multitarget  data  asso¬ 
ciation  (PDAF  to  JPDA)  was  relatively  straightforward  for  the  extended  Kalman  filter. 
The  association  probabilities  Pr  (T^'  |zi:*)  for  targets  whose  gate  volumes  overlapped 
just  needed  to  be  computed  jointly  (see  Section  5.3.3).  Unfortunately,  the  extension  to 
multiple  targets  is  not  as  simple  for  particle  filters. 

Define  JV*  as  the  number  of  targets  that  we  wish  to  track.  Then,  the  state  of  the 
multi  target  system  is 

xfc  =  {xi1}  x!2)  •••  xiWt)}>  (6-32) 

where  the  superscript  notation  x^  is  used  to  denote  those  components  of  X/.  that  cor¬ 
respond  to  target  tB  The  multitarget  state  vector  is  just  the  concatenation  of  Nf  single¬ 
target  “substates.”  Note  that  (6.32)  does  not  imply  any  sort  of  relationship  between 
x^1-*  and  x^2-*  for  t\  7^  i2-  In  fact,  each  xj^  is  assumed  to  evolve  independently  of  all 
others.  With  these  definitions,  the  likelihood  function  can  be  expressed  as 

Mk 

p(zfc|xfc)  =  ^p(zfc|r[.Z),xfc)  Pr(r<Z)|xfc),  (6.33) 

i=i 

where  M);  is  the  total  number  of  association  hypotheses  at  scan  k.  If  we  assume  that 
Pd  is  a  constant  and  Nf  is  known,  the  association  probability  is  independent  of  the 
state  vector,  Pr  (r^  |x* )  =  Pr  (rj? ) .  This  probability  can  then  be  found  in  much  the 

sNote,  in  this  section  only,  the  term  x  ^  will  be  used  to  denote  a  single-target  state,  not  the  tth  sample 
from  a  proposal  distribution. 
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same  way  as  the  single-target  case.  The  conditional  likelihood  term  from  (6.33)  is 

P( z*|lf,x*)=V-^  I]  (6-34) 

ter^ 

where  NlF  is  the  number  of  false  alarms  under  and  the  product  is  taken  over  the 

targets  that  were  detected.  The  superscript  in  identifies  the  measurement  from 

z/.  that  is  associated  with  target  t  under  hypothesis  T^'1 .  The  component  likelihood 
p( z^‘^|x^)  can  be  calculated  using  (6.25). 

On  the  surface,  it  would  seem  that  the  extension  to  multiple  targets  is  trivial  for 
particle  filters.  Conceptually,  this  is  true,  but  there  is  a  serious  practical  difficulty.  Con¬ 
sider  the  definition  of  the  multitarget  state  vector  in  (6.32)  again.  As  a  concatenation  of 
single-target  substates,  the  resulting  dimension  of  x/0  is  Nt  times  larger  than  the  single¬ 
target  case.  To  take  a  numeric  example,  tracking  Nt  =  10  targets  using  the  flight  model 
from  Section  6.1  would  require  a  state  with  10  •  14  =  140  components!  To  adequately 
represent  the  posterior  density  in  such  a  high-dimensional  space,  the  number  of  par¬ 
ticles  would  need  to  be  substantially  larger  than  the  single-target  case.  Furthermore, 
as  Nt  grows,  the  chance  that  at  least  one  of  the  component  likelihoods  from  (6.34)  is 
negligible  for  each  increases.  This,  in  turn,  can  lead  to  degeneracy  of  the  particle 
weights  and  the  need  for  frequent  resampling.  In  other  words,  many  good  single-state 
estimates  will  be  rejected  during  resampling  because  they  are  in  states  that  also 
contain  poor  single-state  estimates.  One  technique  that  has  been  proposed  in  the  liter¬ 
ature  to  reduce  this  effect  is  known  as  independent  partition  particle  filtering  [75].  In 
this  algorithm,  substates  are  swapped  between  particles  so  that  the  components  xjj^  in 
a  given  state  x/0  are  either  mostly  good  or  mostly  bad.  The  particle  weights  must  then 
be  modified  to  undo  the  bias  introduced  by  this  crossover  operation. 

A  second  option  for  multitarget  tracking  is  to  simply  maintain  a  separate  particle 
filter  for  each  target.  In  this  case,  computation  grows  linearly  with  Nt,  as  opposed  to 
the  potentially  exponential  growth  in  the  case  discussed  above.  The  disadvantage,  of 
course,  is  that  data  association  can  no  longer  be  performed  jointly  across  all  Nt  targets. 
Instead,  data  association  is  performed  separately  for  each  filter  using  the  PDA  update 
(6.31)  from  the  previous  section.  While  this  might  sound  like  a  substantial  disadvan¬ 
tage,  recall  that  joint  association  is  only  required  when  the  gates  of  two  or  more  targets 
overlap.  For  gates  to  overlap  in  our  application,  the  targets  would  have  have  to  gen¬ 
erate  similar  delay,  Doppler,  and  RCS  measurements.  Across  multiple  scans,  similar 
delay  and  Doppler  measurements  imply  that  the  targets  have  roughly  the  same  speed, 
heading,  and  location.  Even  if  this  were  the  case,  their  RCS  measurements  might  still 
differ  enough  for  their  gates  to  remain  disjoint.9 

Therefore,  while  it  is  conceptually  attractive  to  handle  data  association  jointly  in 
the  multitarget  scenario,  practically  speaking,  it  may  be  unnecessary.  For  this  reason, 
we  will  use  a  separate  particle  filter  for  each  target  in  our  implementation.  In  Chapter  7, 

^Identical  airplanes  flying  in  formation  would  generate  the  same  delay,  Doppler,  and  RCS.  In  this  case, 
though,  sensor  resolution  is  more  of  a  limiting  factor  than  data  association. 
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we  will  show  that  our  PF-based  system  outperforms  our  IMM-EKF,  which  uses  joint 
data  association,  even  when  multiple  targets  are  present. 
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CHAPTER  7 


EXPERIMENTAL  RESULTS 


In  this  chapter,  we  present  experimental  results  comparing  the  performance  of  the  IMM 
extended  Kalman  filter  described  in  Chapter  5  and  two  variants  of  the  particle  filter 
from  Chapter  6.  In  the  following  sections,  we  will  compare  the  performance  of  these 
systems  on  a  series  of  increasingly  difficult  tasks.  First,  we  consider  the  simplest  case 
of  tracking  a  single  target  with  known  identity.  Then,  we  extend  the  single-target  case 
to  allow  both  false  alarms  and  missed  detections.  Next,  we  explore  the  classification 
capability  of  the  particle  filter  when  tracking  a  single  target.  Finally,  we  conclude  with 
the  most  challenging  scenario:  joint  tracking  and  classification  of  multiple  targets  in 
the  presence  of  clutter  and  missed  detections.  However,  before  we  begin,  there  are 
several  preliminary  matters  that  must  be  addressed,  such  as  our  method  for  generating 
simulated  data  and  definitions  for  the  metrics  that  we  will  use  to  gauge  tracking  and 
classification  performance. 

7.1  Preliminaries 

Although  many  of  the  techniques  presented  throughout  this  report  are  applicable  to 
a  broad  range  of  tracking  and  classification  problems,  we  will  focus  on  the  specific 
application  of  commercial  FM  radio-based  passive  radar  now.  This  decision  impacts 
our  simulations  in  two  important  ways.  First,  operation  in  the  VHF  band  (30-300 
MHz)  places  practical  limitations  on  the  type  and  quality  of  measurement  data  avail¬ 
able.  Whereas  a  typical  active  radar  operating  at  X-band  (8-12.5  GHz)  would  most 
likely  use  delay,  angle  of  arrival  (both  azimuth  and  elevation),  and  possibly  Doppler 
shift,  our  extended  Kalman  filter  will  operate  with  only  delay  and  Doppler,  while  our 
particle  filter  uses  delay,  Doppler,  and  radar  cross  section.  We  neglect  angle  measure¬ 
ments  at  VHF  frequencies  because  typical  antenna  dimensions  result  in  beamwidths 
that  are  too  large  to  be  useful.  For  all  simulations,  we  take  o2  =  0.0001  ms2  and 
<7^=1  Hz2.  This  corresponds  to  range  and  range  rate  uncertainties  from  a  single 
measurement  on  the  order  of  1500  m  and  1.5  m/s,  respectively.  These  are  reasonable 
values  for  a  commercial  FM  broadcast  signal. 

In  addition  to  measurement  availability  and  quality,  focusing  on  FM-band  passive 
radar  influences  the  design  of  our  simulations  in  a  second  important  way.  Namely, 
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because  transmitters  are  essentially  provided  for  free  in  a  passive  radar,  multistatic 
systems  are  quite  feasible  (and,  in  a  sense,  necessary  to  make  up  for  the  shortcomings 
in  data  availability  and  quality).  Because  a  constant  round-trip  range  contour  in  the 
bistatic  scenario  is  an  ellipsoid,  a  minimum  of  three  delay  measurements  are  needed  to 
locate  a  target  in  three-dimensional  space.  In  addition,  we  have  seen  in  Equations  (5.14) 
and  (5.15)  that  Doppler  shift  is  related  to  the  projection  of  the  target’s  velocity  vector 
onto  the  inward  normal  of  the  constant  range  contour  at  the  location  of  the  target.  Thus, 
a  minimum  of  three  transmitters  are  also  required  for  unique  determination  of  a  target’s 
three-dimensional  velocity  vector  from  measurements  of  Doppler  shift.  Of  course,  in 
the  presence  of  noise,  we  could  use  more  than  three  transmitters  to  improve  tracking 
accuracy,  but  any  such  benefit  would  need  to  be  weighed  against  the  resulting  increase 
in  hardware  cost  required  to  field  such  a  system.  As  such,  we  will  use  the  minimum 
number  of  transmitters  in  our  simulations. 

For  simplicity,  we  also  assume  a  single  receiver  location,  although  both  the  IMM- 
EKF  and  PF  could  be  fielded  with  receivers  at  multiple  sites. 1  In  a  sense,  by  restricting 
our  simulations  to  three  transmitters  and  a  single  receiver,  we  are  considering  the  most 
challenging,  but  least  expensive,  configuration  of  an  FM-band  passive  radar  system. 
For  all  experiments,  we  fix  the  sample  period  to  At  =  0.4  s.  As  discussed  earlier, 
we  also  assume  that  only  one  FM  radio  signal  is  processed  at  a  time.  Therefore,  the 
measurement  acquisition  cycles  for  the  three  transmitters  are  interleaved,  with  each 
station  processed  for  0.4  out  of  every  1.2  s. 

In  our  experiments,  we  will  consider  the  two  trajectories  depicted  in  Figure  7.1.  The 
triangular  patches  in  the  upper  panels  of  the  figure  indicate  the  time-varying  angular 
orientation  of  the  aircraft.  Both  last  for  30  s  and  are  realizations  of  the  flight  model 
introduced  for  the  particle  filter  in  Section  6.1.  Under  this  model,  at  any  given  time,  an 
aircraft  operates  in  one  of  the  following  three  rotation  modes: 

•  mode  1  (CV):  constant  velocity  fixed-altitude  flight, 

•  mode  2  (PT):  longitudinal  climb  or  descent  with  thrust  increase, 

•  mode  3  (CT):  constant-speed  coordinated  turn. 

In  the  top  row  of  Figure  7.1,  the  projection  of  the  trajectory  onto  the  ground  plane 
is  color-coded  to  indicate  the  underlying  mode  sequence.  The  lower  two  panes  in  the 
figure  indicate  the  location  of  the  maneuver  relative  to  the  receiver  and  FM  transmitters. 
The  three  transmitters  are  actual  commercial  FM  radio  facilities,  all  located  in  the  state 
of  Maryland.  We  use  their  actual  latitudes,  longitudes,  and  altitudes  and  identify  each 
by  its  FCC  call  sign.  Because  we  adopt  a  receiver-centered  system  as  an  approximation 
for  a  true  inertial  reference  frame,  the  receiver  is  located  at  the  origin. 

As  discussed  in  Section  6.1.4,  our  particle  filter’s  flight  (or  state)  model  is  specified 
by  its  Markov  transition  matrix  and  the  distributions  U^,  U^T,  and  from 
which  the  pilot  command  variables  {wj* ,  A 6*k ,  Tj* }  are  drawn.  The  transition  matrix 

’We  use  the  term  "receiver”  to  refer  broadly  to  the  entire  collection  of  hardware  needed  for  the  reception 
of  both  direct  path  and  reflected  commercial  FM  signals  at  a  single  location. 
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Figure  7.1:  Top  row:  trajectories  for  the  roll-roll  maneuver  and  the  roll -pitch  maneuver. 
The  projection  of  the  path  onto  the  ground  plane  is  color-coded  to  indicate  the  time- 
varying  rotation  mode.  The  bottom  row  shows  the  locations  of  the  target  maneuvers 
relative  to  the  receiver  and  the  FM  radio  transmitters. 
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Table  7. 1 :  Ranges  of  Values  for  Pilot  Command  Variables 


Mode 

Command 

Variable 

True  Range 

PF  Range 

min 

max 

min 

max 

PT 

|A0*|(deg) 

7.45 

14.90 

5.59 

22.35 

t;  (tS8) 

2.5 

5 

1.25 

5 

CT 

K|  (deg/sec) 

3.45 

6.88 

2.58 

10.31 

A  01  (deg) 

2 

2 

1 

3 

F-4E  Phantom 


Fength 

19.20  m 

Wingspan 

11.71  m 

Height 

5.03  m 

Wing  Area 

49.23  m2 

Max  Speed 

670  m/s 

Weight  (empty) 

13397  kg 

Figure  7.2:  The  F-4E  Phantom  fighter  and  its  dimensions.  The  kinematic  coefficients 
used  in  our  single-target  experiments  were  chosen  to  simulate  the  flight  characteristics 
of  the  F-4E. 


governs  the  frequency  of  mode  changes  while  the  turn-rate,  pitch,  and  thrust  commands 
produce  the  actual  maneuvers.  Both  trajectories  were  generated  using  the  transition 
matrix 


0.938 

0.031 

0.031 

0.062  -  e 

0.938 

e 

0.038  -  e 

e 

0.962 

where  e  is  a  small  probability,  and  the  index-to-mode  mapping  is  1  — »  CV,  2  — >  PT, 
and  3  ->  CT.  P  yields  expected  dwell  periods  of  6,  6,  and  10  s  for  the  CV,  PT,  and 
CT  modes,  respectively.  The  third  and  fourth  columns  in  Table  7.1  list  the  ranges 
over  which  the  command  variables  were  allowed  to  vary  in  order  to  produce  the  roll- 
roll  and  roll-pitch  maneuvers  shown  in  Figure  7.1.  The  kinematic  coefficients  used  to 
generate  these  two  trajectories  were  chosen  to  match  those  of  an  F-4E  Phantom  fighter 
plane,  as  shown  in  Figure  7.2.  The  F-4E  Phantom  was  designed  in  the  late  1960s 
and  earned  its  reputation  for  excellent  performance  and  good  maneuverability  during 
the  Vietnam  War.  It  has  since  been  retired  from  combat  use.  In  all  experiments,  the 
particle  filter  will  use  the  true  value  of  the  transition  matrix  P.  However,  to  make  the 
task  more  challenging,  the  ranges  of  the  particle  filter’s  command  variables  will  be 
extended  relative  to  the  true  ranges  used  to  generate  the  trajectories.  These  extended 
ranges  are  listed  in  the  fifth  and  sixth  columns  of  Table  7.1. 

7.1.1  Performance  metrics 

In  order  to  present  a  wide  range  of  results  as  compactly  as  possible,  we  will  need  to 
quantify  various  aspects  of  system  performance  using  simple  figures  of  merit  whenever 
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possible.  We  begin  by  defining  the  scan-fc  estimation  error  from  the  nth  Monte  Carlo 
trial  as 


where  pk  n  is  the  position  estimate  produced  by  the  tracker  (EKF  or  PF),  and  pj.""6'1 
denotes  the  ground  truth  for  scan  k.  In  our  experiments,  a  target  will  be  considered 
“lost”  if  ei\k,./n  >  500  m,  where  Nk  is  the  number  of  scans  per  trial.  (Nk  =  76  for  a 
30  s  maneuver  with  At  =  0.4.)  If  NMC  is  the  number  of  Monte  Carlo  trials  and  Nl  is 
the  total  number  of  lost  tracks  during  these  trials,  we  then  define  the  RMS  error  as 


e-RMS  = 


1 


N 


(■ Nmc  -  NL)Nk 


Nmc—Nl  Nk 

E  £(e*.")2- 

n=  1  k= 1 


(7.3) 


By  excluding  lost  tracks  from  (7.3),  we  can  obtain  the  RMS  error  with  fewer  Monte 
Carlo  trials  because  extremely  bad  runs  will  not  skew  the  average.  The  tracking  per¬ 
formance  of  each  filter  must  then  be  gauged  using  both  quantities:  the  RMS  error 
eRMS& nd  the  number  of  lost  tracks  Nl-  The  computational  complexity  of  each  filter 
will  be  evaluated  using  the  average  time  per  trial.  Finally,  classification  performance 
(for  the  particle  filter)  will  be  quantified  using  confusion  matrices.  For  all  experiments, 
we  take  Nmc  =  100.  This  number  was  found  to  offer  a  reasonable  trade-off  between 
the  precision  of  our  results  and  the  computational  resources  needed  to  conduct  the  sim¬ 
ulations. 


7.1.2  Track  maintenance 

Although  we  want  our  simulations  to  be  as  realistic  as  possible,  there  is  one  key  dif¬ 
ference  between  our  filters  and  the  software  component  of  an  actual  tracking  system: 
neither  of  our  systems  handles  track  maintenance.  Track  maintenance  typically  in¬ 
volves  the  computation  of  a  track  score  function  to  accomplish  three  tasks:  (1)  initia¬ 
tion  of  potential  new  tracks  using  recent  measurements  that  were  not  associated  with 
any  existing  tracks,  (2)  confirmation  of  initiated  tracks  whose  scores  have  surpassed  a 
prescribed  threshold,  and  (3)  deletion  of  confirmed  tracks  whose  scores  have  fallen  be¬ 
low  a  prescribed  threshold  [64].  Using  this  terminology,  each  confirmed  track  should 
correspond  to  an  actual  target. 

We  avoid  the  issue  of  track  maintenance  in  our  simulations  by  assuming  that  the 
number  of  targets  Nt  is  known.  This  assumption  does  not  eliminate  the  need  for  data 
association  because,  while  both  filters  search  for  exactly  Nt  targets  each  scan,  the  cor¬ 
rect  measurement-to-track  assignment  remains  unknown.  Without  a  track  maintenance 
routine,  we  need  an  explicit  way  of  initializing  the  Nt  tracks  for  each  filter.  In  a  real 
system,  unassociated  delay  and  Doppler  measurements  from  consecutive  scans  would 
be  clustered  and  fed  to  a  nonlinear  solver  to  provide  initial  values  of  position  and  ve¬ 
locity.  Because  both  our  IMM-EKF  and  PF  could  use  the  same  nonlinear  solver  to 
generate  initial  position  and  velocity  estimates,  inclusion  of  this  aspect  of  target  track- 
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ing  would  not  offer  much  insight  for  our  comparative  analysis.  Instead,  each  model  of 
the  IMM-EKF  is  initialized  with  x[lnJC\  the  true  value  of  the  target  state  from  the  first 
scan,  and  a  diagonal  covariance  matrix  with  position  and  velocity  standard  deviations 
of  50  m  and  0.5  m/s,  respectively.  The  initial  particle  set  for  the  PF  is  drawn  from  a 
Gaussian  distribution  with  mean  X|""c'  and  the  same  values  of  standard  deviation  for 
position  and  velocity.2  All  tracks  are  initialized  in  the  constant  velocity  mode  of  the 
corresponding  filter.  As  such,  both  filters  begin  with  the  same  initial  distribution.  The 
relatively  high  accuracy  of  the  initialization  is  used  to  model  the  scenario  where  a  non¬ 
maneuvering  target  has  been  tracked  for  a  substantial  period  of  time.  This  allows  us  to 
focus  on  the  effects  of  target  maneuvers  in  our  simulations. 

7.1.3  Generation  of  synthetic  RCS  data 

As  discussed  in  Chapter  6,  the  key  difference  between  the  particle  filter  and  the  ex¬ 
tended  Kalman  filter  is  the  ability  to  incorporate  RCS  as  a  data  feature.  This  abil¬ 
ity  allows  us  to  use  an  aerodynamically  valid  state  model,  which  links  the  tasks  of 
tracking  and  classification  through  a  handful  of  kinematic  coefficients  (Ac  from  Equa¬ 
tion  (6.10)).  In  Chapter  2,  we  demonstrated  that  there  is  no  closed-form  relationship 
between  a  target’s  RCS  and  its  orientation  with  respect  to  the  transmitter  and  receiver. 
Furthermore,  in  the  VHF  band  in  which  we  are  interested,  popular  high-frequency  ap¬ 
proximations  are  invalid.  Instead,  Maxwell’s  equations  must  be  solved  using  a  method 
of  moments  solver  such  as  FlSC.  In  Chapter  2,  we  advocated  using  a  program  like  Fisc 
to  build  RCS  tables  for  every  target  class.  Each  table  would  contain  RCS  values  for  var¬ 
ious  frequencies  and  incident/scattered  directions.  The  extent  of  the  incident/scattered 
angle  space  that  would  need  to  be  sampled  would  depend  on  the  maneuvers  that  each 
target  might  execute  while  being  tracked.  For  example,  the  RCS  table  for  a  target  that 
never  rolled  more  than  20°  would  not  need  to  include  incident  directions  with  large 
elevation  angles  because  it  would  be  impossible  for  FM  transmitters  on  the  ground  to 
ever  “see”  the  top  of  the  plane.3 

Unfortunately,  this  presents  a  problem  for  us.  In  our  simulations,  we  are  primarily 
interested  in  maneuvering  targets.  Thus,  our  RCS  tables  would  need  to  include  virtually 
all  possible  combinations  of  incident/scattered  directions.  Creating  RCS  tables  such  as 
these  for  a  large  number  of  targets  is  a  computationally  daunting  task.  Compounding 
this  difficulty,  the  CAD  model,  which  defines  the  target  geometry  for  FlSC,  must  be  de¬ 
signed  to  precise  standards.  CAD  models  of  this  caliber  are  difficult  to  obtain,  meaning 
the  extent  of  our  classification  experiments  would  be  limited  by  the  number  of  CAD 
models  we  could  obtain,  rather  than  the  capabilities  of  our  PF  algorithm.  Finally,  the 
military  nature  of  some  of  the  CAD  models  that  we  do  possess  might  preclude  publica- 

-The  initial  angular  orientation  of  the  particle  filter  must  also  be  specified.  For  all  particles,  the  roll  angle 
is  initialized  as  zero.  The  yaw  angle  for  each  particle  is  set  so  that  the  velocity  vector  lies  in  the  bx  bz  -plane 
(i.e.,  zero  sideslip  angle).  The  pitch  of  each  particle  is  drawn  from  a  uniform  distribution  about  the  true 
value. 

'This  does  not  imply  that  the  top  of  the  aircraft  is  not  involved  in  the  scattering  process.  At  wavelengths 
in  the  resonance  region  for  a  target,  the  entire  surface  contributes  to  the  scattering  mechanism.  Rather,  we 
mean  that  it  is  impossible  for  a  wavefront  launched  from  the  ground  to  arrive  at  a  nonbanking  target  from 
above. 
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tion  of  our  results  in  the  open  literature.  For  all  these  reasons,  we  instead  seek  a  means 
of  generating  RCS  data  synthetically. 

In  our  simulations,  RCS  data  will  be  simulated  using  a  scattering  center  model.  It 
is  important  that  we  be  clear  on  this  point:  a  scattering  center  model  is  not  valid  in 
the  VHF  band  for  fighter-sized  aircraft  because  the  scattering  response  is  no  longer 
dominated  by  a  few  distinct  target  features  (e.g.,  edges  and  corners).  However,  this  is 
not  a  problem  for  us.  We  merely  need  a  model  that  generates  “RCS-like”  data;  whether 
our  synthetic  data  matches  the  RCS  of  any  existing  aircraft  is  not  a  concern  for  our 
simulations.  To  reiterate,  a  real  implementation  of  our  system,  charged  with  tracking 
and  classifying  real  aircraft,  would  require  an  accurate  RCS  table  for  each  target  in  the 
class  library.  However,  for  proof-of-concept,  we  really  just  need  data  with  properties 
similar  to  RCS,  and  scattering  center  models  have  been  cited  as  being  among  the  most 
useful  and  powerful  tools  in  scattering  theory.  They  provide  a  simple  physical  model 
that  accounts  for  the  lobe  structure  of  RCS  and  provides  a  basis  for  simulations  [76]. 
In  addition,  as  a  model,  they  become  increasingly  accurate  at  high  frequencies.  Of 
course,  the  question  immediately  arises  as  to  whether  our  results  will  be  meaningful 
without  the  use  of  RCS  data  from  real  aircraft.  While  the  use  of  synthetic  RCS  impacts 
our  ability  to  claim  that  our  classifier  would  work  in  the  field,  the  lack  of  actual  radar 
data  makes  it  difficult  to  know  whether  other  effects  (e.g.,  multipath  distortion)  would 
invalidate  our  results.  Thus,  for  our  simulations,  we  will  focus  on  proof-of-concept 
using  a  synthetic  model  for  RCS.  This  will  allow  us  to  create  as  many  target  classes  as 
needed. 

For  our  synthetic  RCS  model,  we  modify  a  popular  parametric  scattering  center 
model  [77].  Because  commercial  FM  radio  signals  have  narrow  bandwidths,  we  can 
ignore  the  frequency  dependence  that  is  included  in  the  amplitude  coefficients  of  mod¬ 
els  based  upon  the  geometrical  theory  of  diffraction  (GTD).  Instead,  we  model  all  scat¬ 
tering  centers  as  point  scatterers  and  express  the  amplitude  of  the  normalized  scattered 
electric  field  (for  a  given  polarization)  as 

Nsc 

Es(X,hX)  =  ~a™  (7.4) 

m=  1 

where  Nsc  is  the  number  of  scattering  centers  used  by  the  model,  and  {rm.  am  :  m  = 
1, . . . ,  Nsc}  is  the  collection  of  their  locations  and  complex  amplitudes.  The  unit  vec¬ 
tors  ii  and  is  represent  the  incident  and  scattered  (or  observed)  directions  in  the  body 
frame  of  the  target.4  According  to  this  model,  each  scatterer  is  visible  from  all  inci¬ 
dent  and  scattered  directions.  This  is  not  a  good  approximation  in  the  VHF  band,  and 
therefore  we  redefine  the  scattering  center  amplitudes  as 

-Om-h-l)2  -Om-h- 1)2 

am=ame  ^  e  2,7  ™  .  (7.5) 

4The  vector  ii  points  towards  the  transmitter  (i.e.,  opposite  the  direction  of  the  incoming  wavefront).  The 
vector  is  points  towards  the  receiver  (i.e.,  along  the  direction  of  the  scattered  wavefront). 
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Figure  7.3:  Bistatic  RCS  in  dBsm  for  0°  incident  elevation  and  0°  observed  elevation. 
The  upper  left  panel  is  the  RCS  generated  by  Fisc  for  a  VFY-218  fighter.  Each  of  the 
other  panels  was  created  using  the  scattering  center  model  from  (7.4)  and  (7.5)  and  a 
different  set  of  randomly  generated  parameters. 


Because  im  is  a  unit  vector,  we  have  max-.  =  1.  The  same  applies  for  is.  Thus, 

each  im  defines  the  direction  in  which  the  corresponding  scattering  center  is  maximally 
visible.  Accordingly,  each  afn  determines  the  angular  extent  about  im  in  which  the  mth 
scattering  center  will  contribute  significantly  to  Es .  We  refer  to  the  two  exponential 
terms  in  (7.5)  as  the  incident  and  observed  visibility  factors.  Because  of  them,  indi¬ 
vidual  scattering  centers  will  blink  in  and  out  as  the  target’s  aspect  with  respect  to  the 
transmitter  and  receiver  changes.  The  term  am  in  (7.5)  is  the  typical  complex  scat¬ 
tering  center  amplitude  found  in  [77].  With  this  modification,  Es  for  a  given  target 
class  is  completely  defined  by  Nsc  and  {rm,  am:  im,  a ^  :  to  =  1, . . . ,  Nsc}.  The  syn¬ 
thetic  RCS  that  we  will  use  in  our  simulations  is  then  simply  o  =  |i?s|2,  where  Es  is 
computed  using  (7.4)  and  (7.5). 

For  our  simulations,  we  will  need  seven  different  target  classes.  Other  than  Nsc, 
all  parameters  will  be  generated  randomly  for  each  class.  The  complex  amplitudes 
\am  }  will  be  drawn  uniformly  from  a  finite  range  in  order  to  upper-bound  the  resulting 
RCS.  Likewise,  the  locations  of  the  scatterers  {r„, }  will  be  drawn  from  a  uniform 
distribution  over  the  volume  occupied  by  a  typical  fighter-sized  aircraft.  However,  to 
make  our  synthetic  RCS  as  realistic  as  possible,  we  will  force  lateral  symmetry  among 
the  collection  of  scattering  centers.  This  implies  that  only  Nsc/2  of  the  scatterers  are 
independent;  the  others  are  reflections  of  the  first  half  through  the  bxbz-p\&ne.5 

Figure  7.3  shows  the  bistatic  RCS  in  the  zero-degree  elevation  plane  for  the  seven 
classes  we  will  need.  The  scattering  center  models  each  use  between  12  and  18  scat¬ 
terers.  As  a  means  of  checking  the  quality  of  our  approximation,  the  actual  RCS  of 
a  VFY-218  fighter  (as  generated  by  FlSC)  is  shown  in  the  upper  left  panel.  Although 

5  More  specifically,  the  vectors  rm  and  im  are  reflected  through  the  bx  bz -plane.  The  parameters  am  and 
cr^,  remain  equal  to  their  reflected  equivalents. 
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some  of  the  models  are  clearly  more  “RCS-like”  than  others,  they  all  suffer  from  a 
couple  shortcomings.  First,  the  forward  scatter  phenomenon  (the  two  bright  red  lines 
in  the  VFY-218  plot)  is  missing  from  the  synthetic  models.  However,  because  neither 
of  the  maneuvers  occurs  between  any  of  the  transmitters  and  the  receiver,  this  will  not 
impact  our  simulations.  Second,  the  synthetic  RCS  values  for  approximately  nose-on 
geometries  are  sometimes  too  large.  However,  because  we  do  not  have  a  monostatic 
geometry,  the  target  is  never  simultaneously  illuminated  and  observed  nose-on. 

In  summary,  even  though  the  assumptions  of  the  scattering  center  model  are  in¬ 
appropriate  in  the  VHF  band  for  fighter-sized  aircraft,  we  find  that  the  resulting  RCS 
plots  have  similar  enough  fine  structure  for  use  in  our  simulations.  In  the  following 
single-target  experiments,  scattering  center  (SC)  model  1  will  be  used  to  generate  the 
necessary  RCS  measurements. 


7.2  Tracking  Results  for  a  Single  Target 

In  this  section,  we  begin  by  considering  the  simplest  scenario:  tracking  a  single  tar¬ 
get  of  known  identity  in  a  clutter-free  environment  with  Pn  =  1.  These  simulations 
will  allow  us  to  determine  the  fundamental  tracking  capability  of  each  system.  Un¬ 
fortunately,  it  is  not  as  simple  as  running  a  single  set  of  trials  for  each  tracker.  Both 
systems  have  several  free  parameters  that  must  be  adjusted  for  optimal  performance.  In 
the  following  discussion  we  will  explore  the  effects  of  these  free  parameters  on  system 
performance. 


7.2.1  Tracking  performance  of  the  IMM  extended  Kalman  filter 

We  first  consider  the  tracking  performance  of  our  EKF  tracker.  Because  our  EKF  is 
implemented  using  the  Interacting  Multiple  Model  (IMM)  algorithm,  there  are  three 
sets  of  parameters  that  must  be  specified.  First,  we  must  decide  how  many  models  will 
be  used.  As  discussed  in  Section  5.1,  we  follow  popular  practice  and  implement  our 
filter  using  a  constant  velocity  (CV)  model,  a  constant  acceleration  (CA)  model,  and 
coordinated  turn  (CT)  model  [40, 64].  Second,  we  must  specify  the  Markov  transition 
matrix  for  the  IMM  algorithm.  Because  limw_>o  F’kCT\u>)  =  ^1°^’  t^le  IMM  co¬ 
ordinated  turn  model  can  also  serve  as  a  constant  acceleration  model.  The  IMM-CA 
model  is  instead  used  to  model  the  brief,  but  substantial,  accelerations  present  at  the 
beginning  and  end  of  maneuvers.  In  this  case,  we  adopt  the  transition  matrix 


0.937 

0.063 

0 

Pimm  — 

0.125 

0.75 

0.125 

0 

0.045 

0.955 

(7.6) 


where  the  index-to-model  mapping  is  1  — >  CV,  2  — >  CA,  and  3  — >  CT.  The  dwell 
probability  of  the  IMM-CV  model  is  chosen  to  match  Pn  from  (7.1).  The  dwell  proba¬ 
bility  of  the  IMM-CA  model  is  chosen  to  yield  an  expected  dwell  time  of  1.2  s.  Finally, 
the  dwell  probability  of  the  IMM-CT  model  is  selected  to  yield  an  expected  dwell  time 
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Table  7.2:  EKF  Track  Accuracy  versus  IMM  Process  Noise  Variances 
Roll-Roll  Maneuver  Roll-Pitch  Maneuver 


q(CV) 

q{CA) 

q(cT) 

e-RMS  (m) 

Nl 

e-RMS  (m) 

Nl 

3 

60 

3 

136.80 

6 

146.93 

5 

9 

139.26 

2 

149.74 

3 

15 

148.21 

8 

154.90 

8 

3 

80 

3 

132.09 

3 

140.13 

5 

9 

140.94 

4 

144.01 

8 

15 

140.71 

5 

154.60 

3 

3 

100 

3 

138.68 

3 

138.65 

7 

9 

143.47 

3 

146.65 

3 

15 

137.89 

2 

140.37 

3 

3 

120 

3 

136.96 

3 

153.61 

5 

9 

155.52 

5 

142.22 

9 

15 

135.33 

3 

146.95 

2 

of  8.5  s.  The  third  and  final  set  of  parameters  that  must  be  specified  are  the  process 
noise  variances  {q^cv  •* ,  q''CA^> ,  q''CT^> }  for  the  three  models.  As  mentioned  in  [64],  the 
choice  of  these  parameters  can  be  considered  a  tuning  process  best  determined  empiri¬ 
cally. 

In  order  to  find  good  values  for  {ql'CV  \  q^CA\  q(CT^},  we  will  run  100  Monte 
Carlo  simulations  for  each  maneuver  in  Figure  7.1  and  various  choices  of  the  three 
noise  variances.  The  number  of  combinations  that  must  be  considered  can  be  limited 
by  noting  that  the  standard  deviation  of  each  IMM  model  should  be  on  the  order  of  the 
maximum  change  in  acceleration  during  a  sample  period  in  which  the  target  operates 
in  the  mode  specified  by  the  model.  Thus,  because  the  IMM-CV  model  corresponds 
to  nominally  zero  acceleration,  q'(  '  '  can  be  expected  to  be  relatively  small.  On  the 
other  hand,  because  the  IMM-CA  model  corresponds  to  maneuver  transients,  q'CA) 
will  need  to  be  substantially  larger.  Finally,  because  the  centripetal  acceleration  of  a 
coordinated  turn  is  already  incorporated  in  the  structure  of  F^CT\u>),  q(CT^  does  not 
need  to  be  very  large.  However,  because  the  IMM-CT  model  must  also  match  sustained 
longitudinal  accelerations  such  as  the  pitch/thrust  portion  of  the  roll-pitch  maneuver,  a 
good  choice  for  q^CT">  will  be  a  trade-off. 

Table  7.2  displays  the  results  of  12  different  process  noise  combinations.  Overall, 
we  find  that  the  IMM-EKF  performs  better  against  the  roll-roll  maneuver  than  the  roll- 
pitch  maneuver.  This  is  not  surprising  because  the  IMM-EKF  includes  a  coordinated 
turn  model.  However,  because  the  EKF  cannot  model  the  true  dynamics  of  air  flight, 
it  does  not  perform  as  well  against  a  climbing  or  descending  target  such  as  during  the 
pitch/thrust  portion  of  the  roll-pitch  maneuver.  In  terms  of  the  IMM  noise  variances, 
the  best  choice  of  q ^CT">  seems  to  vary  directly  with  q(CA\  As  q^CASl  grows,  it  be¬ 
comes  an  increasingly  poor  match  to  any  portion  of  either  maneuver,  and  ql(  ,  >  must 
increase  proportionately  in  order  to  serve  as  both  a  coordinated  turn  and  constant  ac¬ 
celeration  model.  Because  we  want  the  filter  models  to  operate  as  designed,  we  select 
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Figure  7.4:  Example  of  tracking  performance  of  the  IMM-EKF  during  the  roll-pitch 
maneuver.  Red  lines  are  true  values;  blue  lines  are  IMM-EKF  estimates.  Vertical  black 
dotted  lines  indicate  changes  in  the  target’s  flight  mode. 

the  combination  q(cv^  =  3,  g(C<j4)  =  80,  and  q(CT^  =  3  as  providing  the  best  overall 
tracking  performance  for  both  maneuvers.  We  also  note  that  the  number  of  lost  tracks 
is  relatively  small  for  this  choice.  From  here  forward,  all  results  for  the  IMM-EKF 
will  be  given  with  respect  to  this  choice  of  noise  variances.  It  should  also  be  men¬ 
tioned  that  the  computational  complexity  of  the  IMM-EKF  is  largely  independent  of 
the  values  selected  for  q(cv\  q(CA\  and  q^1' .  The  average  run-time  for  the  filter 
was  approximately  4  s  for  all  of  the  combinations  listed  in  Table  7.2. 

To  understand  the  shortcomings  of  the  EKF-IMM  against  the  types  of  maneuvers 
we  are  considering,  it  is  helpful  to  look  at  the  results  from  a  single  Monte  Carlo  run. 
Figure  7.4  shows  the  filtered  estimates  of  target  velocity  and  acceleration  from  one  of 
the  100  roll-pitch  experiments  conducted  for  the  optimal  combination  of  IMM  noise 
variances  selected  above.  The  top  row  depicts  the  components  of  target  velocity  in 
a  receiver-centered  Earth-fixed  system  (from  left  to  right,  the  component  directions 
are  east,  north,  and  up).  The  same  applies  to  the  bottom  row,  which  depicts  target 
acceleration.  The  vertical  dotted  lines  indicate  transitions  in  the  flight  mode  of  the 
target.  Referring  back  to  Figure  7.1,  we  see  that  the  roll-pitch  maneuver  consists  of  the 
following  sequence  of  modes:  CV,  CT,  CV,  PT,  CV. 

Because  FM-band  passive  radar  provides  high  accuracy  measurements  of  Doppler 
shift,  we  expect  the  IMM-EKF  estimate  of  velocity  to  be  very  good.  While  this  is 
clearly  true  of  ve  and  vn  in  Figure  7.4,  the  filtered  estimate  of  Vu,  the  vertical  velocity 
of  the  target,  is  quite  poor.  To  understand  why  this  is  so,  we  refer  back  to  (5.14),  which 
defines  the  relationship  between  the  velocity  vector  and  Doppler  shift.  We  see  that  the 
observability  of  the  velocity  vector  during  a  given  scan  depends  critically  on  the  orien¬ 
tation  of  the  surface  normal  fith  (p/,. )  defined  in  (5.15).  For  both  trajectories,  because 
the  altitude  of  the  target  is  relatively  small  compared  to  the  distance  to  the  receiver  or 
transmitters,  fitk  (p/,-)  will  be  nearly  parallel  to  the  ground  plane  for  all  k.  This  means 


94 


that  changes  in  vertical  velocity  will  be  more  difficult  to  detect  through  Doppler  shift 
than  corresponding  changes  in  horizontal  velocity  (i.e.,  Ve  or  vn)-  This  is  precisely 
what  we  see  in  Figure  7.4.  The  solution  to  this  shortcoming  is  to  track  changes  in  target 
pitch,  which  cause  changes  in  vertical  velocity.  Thus,  although  changes  in  vjj  will  be 
similarly  difficult  to  detect  through  Doppler  shift  for  a  particle  filter,  changes  in  pitch 
will  be  detectable  via  RCS.  Coupled  with  an  aerodynamically  valid  flight  model,  we 
will  find  that  our  particle  filter  provides  superior  estimation  of  vu- 

The  bottom  row  in  Figure  7.4  verifies  that  the  magnitudes  of  the  IMM  noise  vari¬ 
ances  are  adequate  to  track  the  true  acceleration  curves.  However,  we  note  a  significant 
lag  in  the  filter’s  estimates  of  a e  and  at  the  onset  of  the  target  roll  (the  second 
segment  of  the  trajectory).  This  is  not  surprising  because  the  IMM-EKF  only  measures 
target  velocity  (through  Doppler  shift).  Thus,  with  the  onset  of  a  new  force  acting  upon 
the  target,  the  resulting  change  in  acceleration  goes  undetected  until  the  target’s  hori¬ 
zontal  velocity  changes  considerably.  In  the  next  section,  we  will  find  that  the  change 
in  RCS  accompanying  a  target  roll  allows  the  particle  filter  to  track  changes  in  accel¬ 
eration  more  quickly.  We  defer  analysis  of  the  IMM  model  probabilities  until  the  next 
section,  where  they  can  be  directly  compared  to  the  particle  filter’s  mode  probabilities. 

7.2.2  Tracking  performance  of  the  particle  filters 

In  this  section,  we  quantify  the  tracking  performance  of  both  a  generic  particle  filter 
(PF)  that  uses  the  prior  as  proposal  distribution  and  the  auxiliary  particle  filter  (APF) 
described  in  Section  4.3.4.  As  with  the  IMM-EKF,  we  will  perform  100  Monte  Carlo 
trials  for  each  of  the  maneuvers  in  Figure  7.1.  The  process  noise  parameters  for  the 
particle  filter  state  model  were  already  presented  in  Table  7.1.  However,  because  the 
particle  filter  can  incorporate  RCS,  we  now  need  to  specify  the  variance  of  the  Gaussian 
noise  added  to  the  in-phase  and  quadrature  components  of  the  scattering  coefficient. 
For  now,  we  take  =  8.0  m2  for  all  k  and  postpone  discussion  about  the  validity  of 
this  assumption  until  the  next  section.  The  only  parameter  left  to  specify  then  is  Np, 
the  number  of  particles.  Similar  to  the  IMM  noise  variances,  this  can  be  considered  a 
tuning  parameter  that  is  best  determined  empirically.  Because  computational  complex¬ 
ity  scales  linearly  with  Np,  we  wish  to  find  the  smallest  value  that  yields  good  tracking 
performance. 

Table  7.3  contains  results  quantifying  the  tracking  performance  of  both  the  generic 
particle  filter  and  the  auxiliary  particle  filter  for  many  different  choices  of  Np.  Four 
comments  are  in  order.  First,  as  expected,  both  the  RMS  error  and  the  number  of  lost 
tracks  decrease  as  Np  increases.  There  are  a  few  entries  in  the  table  where  this  is  not 
the  case,  but  these  can  be  attributed  to  the  variation  inherent  in  Monte  Carlo  simula¬ 
tions.  Second,  we  find  that  the  auxiliary  particle  filter  meets  or  exceeds  the  tracking 
accuracy  of  the  generic  particle  filter  for  both  maneuvers  on  all  but  a  single  value  of  Np 
(Np  =  200  for  the  roll-roll  maneuver).  This  is  not  surprising  either  because  the  generic 
particle  filter  uses  the  prior  as  proposal  distribution,  7r(xfc|xjj,*l1;  z^)  =  p(x/;  xj.^). 
As  such,  it  operates  without  any  knowledge  of  the  current  measurement.  The  auxiliary 
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Table  7.3:  PF  Tracking  Accuracy  versus  Particle  Count 

Roll-Roll  Maneuver  Roll-Pitch  Maneuver 


Np 

e-RMS  (m) 

Nl 

eRMS  (m) 

Nl 

Generic 

Particle 

Filter 

200 

92.64 

0 

122.93 

7 

300 

92.03 

0 

114.53 

4 

400 

91.80 

0 

107.94 

1 

600 

86.78 

0 

92.31 

0 

800 

80.70 

0 

97.84 

0 

Auxiliary 

Particle 

Filter 

200 

94.91 

0 

112.06 

5 

300 

86.92 

0 

107.12 

3 

400 

80.25 

0 

91.77 

3 

600 

86.47 

0 

92.10 

0 

800 

76.61 

0 

89.93 

0 

particle  filter,  on  the  other  hand,  takes  7r(x*,i|zi:fc)  oc  p(z*|A^)p(x*|x^.1)io^,l1) 
and  is  thus  able  to  “steer”  its  particles  towards  promising  areas  of  the  state  space  based 
upon  Zf;  (see  Section  4.3.4).  A  third  observation  from  Table  7.3  is  that  the  roll-pitch 
maneuver  is  more  difficult  to  track  than  the  roll-roll  maneuver.  This  can  be  understood 
by  referring  back  to  the  command  variable  ranges  in  Table  7.1.  We  see  that  the  PT  and 
CT  modes  both  have  two  command  variables  (i.e.,  two  variables  that  largely  determine 
the  flight  path  until  the  next  mode  transition).  However,  closer  inspection  shows  that 
the  command  variable  controlling  the  CT  pitch  increase  has  a  very  small  range.  (Other¬ 
wise,  it  would  be  a  climbing  or  descending  turn.)  In  essence,  a  particle  filter  primarily 
needs  to  accurately  determine  wj*,  the  turn-rate,  at  the  onset  of  a  coordinated  turn;  the 
precise  value  of  the  CT  pitch  increase  has  little  effect  on  the  resulting  trajectory.  On  the 
other  hand,  the  pitch  change  for  the  PT  mode  can  be  considerable,  and  therefore,  both 
of  the  command  variables  for  the  PT  mode  must  be  accurately  estimated.  In  simple 
terms,  there  is  more  variability  in  the  pitch/thrust  mode  than  the  coordinated  turn.  This 
is  compounded  by  the  fact  that  the  change  in  target  aspect  accompanying  the  CV  PT 
transition  is  not  nearly  as  substantial  as  that  of  the  CV  — >  CT  transition.  Thus,  RCS  is 
typically  a  better  indicator  of  the  onset  of  a  coordinated  turn  than  a  climb  or  descent. 
The  fourth  and  final  observation  to  be  made  from  Table  7.3  concerns  its  comparison 
with  the  IMM-EKF  error  statistics  in  Table  7.2.  We  find  that  both  particle  filters  yield 
at  least  a  33%  reduction  in  RMS  error  on  both  maneuvers.  In  addition,  for  all  but 
Np  =  200  on  the  roll-pitch  maneuver,  they  also  result  in  fewer  lost  tracks. 

Upon  first  thought,  it  might  not  seem  surprising  that  both  particle  filters  outperform 
the  IMM-EKF;  after  all,  they  use  the  same  motion  model  as  the  simulated  trajectories. 
Two  comments  should  be  made,  though.  First,  as  depicted  in  Table  7.1,  the  extent  of 
the  state  space  in  which  the  particle  filters  operate  is  considerably  larger  than  the  extent 
of  the  state  space  in  which  the  roll-roll  and  roll-pitch  maneuvers  exist.  Second,  and 
more  importantly,  we  recall  that  the  motion  model  was  designed,  first  and  foremost, 
to  produce  aerodynamically  valid  trajectories.  Thus,  it  is  not  that  the  data  has  been 
fashioned  to  work  well  with  a  particle  filter.  Rather,  the  flexibility  of  the  sequential 
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Figure  7.5:  Example  of  the  tracking  performance  of  the  generic  particle  filter  with 
Np  =  600.  Red  lines  are  true  values;  blue  lines  are  particle  filter  estimates.  Vertical 
black  dotted  lines  indicate  changes  in  the  target’s  flight  mode. 


importance  sampling  framework  has  allowed  us  to  implement  a  valid  state  model  for 
flight. 

Figure  7.5  depicts  several  estimates  produced  by  the  generic  particle  filter  with 
Np  =  600  for  one  of  its  roll-pitch  Monte  Carlo  trials.  The  top  row  is  target  velocity  in 
receiver-centered  east-north-up  coordinates.  The  bottom  row  is  the  Euler  angles  needed 
to  bring  the  inertial  frame  into  coincidence  with  the  body-centered  frame.  Comparing 
the  top  rows  of  Figures  7.4  and  7.5,  we  find  that  the  particle  filter  is  equally  adept  as 
the  IMM-EKF  at  tracking  ve  and  Vn  .  However,  the  particle  filter  is  noticeably  better 
at  tracking  vjj,  the  target’s  vertical  velocity.  Recall,  this  component  of  the  velocity 
vector  is  typically  much  more  difficult  to  measure  through  Doppler  shift  than  horizontal 
velocity.  Inclusion  of  RCS  in  the  measurement  vector  allows  the  particle  filter  to  detect 
the  change  in  pitch,  which  causes  the  change  in  v\j.  In  the  roll-pitch  maneuver,  a  sharp 
decrease  in  pitch  at  18  s  results  in  a  decrease  in  the  angle  of  attack,  which  reduces  the 
magnitude  of  the  lift  vector.  The  natural  result:  the  target  begins  to  descend.  We  note 
that  the  estimation  error  in  i’u  during  the  target’s  descent  (from  t  =  18-24  s)  is  caused 
by  the  4-5°  error  in  the  filter’s  pitch  estimate  during  the  same  period.  Depending  on 
many  factors,  such  as  target  geometry  and  orientation  with  respect  to  the  transmitters 
and  receiver,  a  discrepancy  of  such  small  magnitude  may  not  be  detectable  through 
RCS. 

We  have  identified  one  of  the  reasons  for  the  particle  filter’s  superior  performance: 
more  accurate  estimation  of  vertical  velocity.  Now,  we  highlight  a  second  reason  for 
the  improvement  in  tracking  accuracy.  In  Figure  7.6,  we  display  examples  of  both 
the  IMM  model  probabilities  and  the  particle  filter  mode  probabilities  for  the  roll-roll 
and  roll-pitch  maneuvers.  It  is  important  to  remember  that  models  are  labeled  based 
upon  the  flight  mode  they  are  intended  to  represent.  The  underlying  models  are  not 
the  same,  though.  For  example,  even  though  the  IMM-EKF  and  PF  both  have  CV 
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Generic  PF  (noise_var=8) 


Generic  PF  (noise_var=8) 


Figure  7.6:  Example  of  mode  probabilities  for  the  IMM-EKF  and  the  generic  particle 
filter  (Np  =  600).  Left  column:  roll-roll  maneuver.  Right  column:  roll-pitch  maneu¬ 
ver.  Vertical  black  dotted  lines  indicate  changes  in  the  target’s  flight  mode. 


Table  7.4:  Transition  Times  and  Mode  Sequences  for  Both  Trajectories 


Roll-Roll 

Time  (s) 

0 

4.0 

13.6 

20.8 

- 

Mode 

CV 

CT 

CV 

CT 

- 

Roll-Pitch 

Time  (s) 

0 

2.8 

14.0 

18.0 

24.0 

Mode 

CV 

CT 

CV 

PT 

CV 

models  for  tracking  constant  velocity  flight,  these  models  differ  in  significant  ways. 
The  particle  filter’s  CV  model  requires  that  the  bank  angle  be  zero.  The  CV  model  for 
the  IMM-EKF,  on  the  other  hand,  places  no  requirements  on  bank  angle  because  the 
EKF  has  no  practical  way  of  measuring  target  orientation.  The  same  remark  applies  to 
the  coordinated  turn  models  of  the  IMM-EKF  and  PF. 

In  Figure  7.6,  the  vertical  black  dotted  lines  indicate  times  at  which  mode  changes 
occur.  To  assist  our  analysis,  the  correct  mode  sequence  and  transition  times  are  listed 
in  Table  7.4.  Comparing  the  top  row  of  Figure  7.6  with  the  entries  in  Table  7.4,  we  find 
that  the  IMM  models  are  operating  as  designed  for  both  trajectories.  First,  all  constant 
velocity  segments  are  correctly  detected.  Second,  the  IMM-CA  model  (whose  variance 
we  tuned  in  Section  7.2.1)  only  has  significant  probability  during  mode  transitions,  as 
designed.  Finally,  the  IMM-CT  model  is  the  most  probable  during  both  coordinated 
turns  and  climbs/descents  (e.g.,  the  segment  beginning  at  t  =  18  s  of  the  roll-pitch 
maneuver).  This  is  to  be  expected  because  the  IMM-CT  model  reduces  to  a  constant 
acceleration  model  as  u>  approaches  zero. 

Comparing  the  bottom  row  of  Figure  7.6  with  the  entries  in  Table  7.4,  we  find  that 
the  particle  filter  is  also  performing  as  designed.  Notably,  because  the  particle  filter 
incorporates  a  pitch/thrust  (PT)  mode  for  climbs  and  descents,  the  PT  segment  of  the 
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Table  7.5:  Average  Run-Time  for  the  Roll-Roll  Maneuver  versus  Particle  Count 


Np 

Generic  PF 
Time  (s) 

Auxiliary  PF 
Time  (s) 

200 

129.8 

264.9 

300 

201.0 

386.8 

400 

267.0 

515.2 

600 

400.9 

777.1 

800 

535.1 

1036.4 

roll-pitch  maneuver  is  correctly  identified.  While  Figure  7.6  shows  that  both  filters  are 
operating  properly,  it  clearly  identifies  a  second  reason  for  the  particle  filter’s  superior 
performance:  time  lag  before  a  mode  transition  is  detected.  Comparing  the  delays 
between  transitions  in  Table  7.4  (the  black  dotted  lines  in  the  plots)  and  changes  in 
each  filter’s  mode  probabilities,  we  find  that  the  particle  filter  is  substantially  faster 
than  the  IMM-EKF  at  detecting  the  onsets  of  new  flight  modes.  Two  specific  examples 
of  this  are  the  CT  — »CV  transition  at  13.6  s  of  the  roll-roll  maneuver  and  the  CV  — >CT 
transition  at  2.8  s  of  the  roll-pitch  maneuver.  While  the  particle  filter  should  detect 
transitions  involving  the  PT  mode  more  quickly  (because  the  IMM-EKF  does  not  have 
a  legitimate  pitch/thrust  model),  the  two  examples  we  have  highlighted  involve  flight 
modes  for  which  the  IMM-EKF  has  been  optimized. 

The  reason  for  this  discrepancy  in  performance  between  the  IMM-EKF  and  the 
PF  is  easy  to  understand.  A  change  in  target  orientation  alters  the  net  force  on  the 
target  and  eventually  results  in  a  change  in  velocity.  Because  the  IMM-EKF  cannot 
incorporate  RCS  measurements,  it  must  detect  changes  in  flight  mode  through  changes 
in  target  velocity.  As  such,  the  IMM-EKF  is  “one  integration’’  removed  from  the  source 
of  the  mode  transition.  The  particle  filter,  on  the  other  hand,  is  able  to  use  RCS  to  detect 
the  changes  in  target  aspect  that  are  present  at  the  very  beginning  of  a  new  flight  mode. 
This  translates  into  faster  mode  detection,  which  provides  superior  estimation  accuracy. 

So  far  in  this  section,  we  have  evaluated  the  tracking  performance  of  both  a  generic 
particle  filter  and  an  auxiliary  particle  filter.  We  have  found  that  both  types  outper¬ 
form  the  IMM-EKF.  The  two  main  reasons  for  the  superior  performance  are  better 
estimation  of  vertical  velocity  and  faster  mode  detection  following  transitions.  Both 
improvements  can  be  attributed  to  the  particle  filter’s  ability  to  incorporate  RCS  as  a 
data  feature,  which  permits  the  use  of  a  aerodynamically  valid  state  model.  Of  course, 
engineering  is  full  of  trade-offs,  and  the  price  we  pay  for  the  flexibility  of  the  sequential 
importance  sampling  framework  is  a  substantial  increase  in  computation.  Table  7.5  lists 
the  average  times  per  trial  as  a  function  of  Np  on  the  roll-roll  maneuver.  As  expected, 
the  run-times  of  both  PF  algorithms  scale  linearly  with  Np.  We  note  that  the  auxiliary 
particle  filter  requires,  on  average,  just  under  twice  as  much  time  as  the  generic  particle 
filter.  If  we  revisit  the  pseudo-code  description  of  the  APF  algorithm  in  Figure  4.7, 
we  find  that  each  iteration  of  the  APF  is  actually  equivalent  to  two  complete  iterations 
of  the  generic  particle  filter.  However,  incorporation  of  the  current  measurement  in 
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the  proposal  distribution  yields  particle  weights  that  are  more  uniform.  Because  re¬ 
sampling  then  occurs  less  frequently  (i.e.,  after  the  second  measurement  update),  this 
explains  why  the  APF  is  slightly  less  than  twice  as  slow  as  the  generic  PF.  Finally, 
we  recall  that  the  IMM-EKF  had  an  average  run-time  of  approximately  4  s  per  trial. 
The  generic  particle  filter  with  Np  =  600  (whose  results  are  displayed  in  Figure  7.5) 
requires  400  s  per  run  —  two  orders  of  magnitude  greater  than  the  IMM-EKF! 

Two  comments  need  to  be  made  on  this  point.  First,  we  must  remember  that  the 
particle  filter  is  actually  doing  two  things  simultaneously:  tracking  and  classifying. 
Even  though  we  have  only  discussed  tracking  performance  for  a  single  known  target, 
the  filter  would  operate  exactly  the  same  if  the  target  class  was  unknown.6  Second,  be¬ 
cause  of  the  substantial  difference  in  computation,  it  might  seem  that  our  performance 
comparisons  would  be  more  fair  if  we  included  additional  models  in  our  IMM-EKF  and 
thereby  narrowed  the  run-time  gap.  Unfortunately,  this  is  unlikely  to  lead  to  better  per¬ 
formance  for  the  IMM-EKF.  Simply  put,  the  EKF  cannot  use  the  correct  motion  model 
because  it  has  no  way  of  measuring  target  aspect.  Including  additional  mismatched 
models  is  unlikely  to  correct  for  this  fundamental  deficiency. 

Choice  of  scattering  coefficient  noise  variance 

In  this  section,  we  briefly  address  our  choice  for  the  variance  of  the  additive  Gaus¬ 
sian  noise  that  affects  the  measurement  of  the  in-phase  and  quadrature  components  of 
the  scattering  coefficient.  All  of  the  experimental  results  presented  so  far  have  taken 
cr2  =  8  m2.  Using  Equations  (6.22)  and  (6.23),  the  noncentral  chi-squared  distribution 
for  RCS  then  has  an  expected  value  of  |sfc|2  +  16  and  a  variance  of  32(|s*  |2  +  8).  For 
large  values  of  RCS,  the  measurement  error  can  clearly  be  substantial.  The  question 
then  becomes,  “Is  this  a  realistic  value  for  a2  ?”  Unfortunately,  there  is  no  easy  answer 
for  this  question.  Because  current  passive  radars  do  not  attempt  to  perform  joint  track¬ 
ing  and  classification,  they  have  no  need  to  extract  RCS  as  a  data  feature.  As  such, 
the  current  literature  has  little  to  offer  on  this  matter.  Instead,  we  will  investigate  the 
influence  of  <r2  on  the  tracking  performance  of  the  generic  particle  filter.  If  it  happens 
that  the  filter  is  not  especially  sensitive  to  the  value  of  a2,  then  the  exact  value  that  is 
used  in  our  experiments  becomes  less  important. 

In  Table  7.6,  tracking  results  for  the  generic  particle  filter  (with  Np  =  600)  are 
presented  for  the  roll-roll  and  roll-pitch  trajectories  using  several  different  values  of 
a2 .  As  usual,  100  Monte  Carlo  trials  were  run  for  each  entry  in  the  table.  For  the  roll- 
roll  maneuver,  we  find  that  the  tracking  performance  does  not  seem  to  vary  greatly  with 
the  value  of  a2.  This  occurs  because,  as  the  RCS  measurements  become  increasingly 
noisy,  the  particle  filter  can  still  fall  back  upon  delay  and  Doppler  data  for  tracking. 
The  entries  for  the  roll-pitch  maneuver  in  Table  7.6  exhibit  a  strange  trend,  though.  As 
<72  increases,  tracking  performance  initially  improves]  After  ct2  =  8,  performance  then 

6In  terms  of  code,  the  known-identity  scenario  is  created  by  initializing  all  particles  with  the  same  (cor- 
rect)  target  class.  If  the  target  class  was  unknown,  the  particles  would  be  initialized  with  equal  numbers 
devoted  to  each  potential  class.  From  an  algorithmic  perspective,  there  is  no  difference  between  the  two 
cases. 
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Table  7.6:  Generic  PF  Tracking  Accuracy  versus 

Roll-Roll  Maneuver  Roll-Pitch  Maneuver 


zrms  (m) 

Nl 

eRMS  (m) 

Nl 

2 

85.85 

0 

110.26 

7 

4 

89.87 

0 

102.50 

0 

8 

86.78 

0 

92.31 

0 

16 

79.39 

0 

94.21 

0 

32 

86.02 

0 

99.29 

1 

Figure  7.7:  Example  of  the  mode  probabilities  for  the  generic  particle  filter  with  Np  = 
600  and  =  2.  Left  plot:  roll-roll  maneuver.  Right  plot:  roll-pitch  maneuver. 


begins  to  degrade,  as  expected.  To  understand  why  less  noise  could  actually  degrade 
track  accuracy,  we  plot  the  mode  probabilities  from  one  of  the  —  2  simulations  in 
Figure  7.7. 

Comparing  these  two  plots  with  those  from  the  bottom  row  of  Figure  7.6,  we  im¬ 
mediately  notice  a  difference.  In  the  second  CT  segment  of  the  roll-roll  maneuver 
(20.8-30  s)  and  the  only  CT  segment  of  the  roll-pitch  maneuver  (2.8-14.0  s),  the  par¬ 
ticle  filter  with  =  2  incorrectly  “chatters”  between  the  CT  and  CV  modes.  This 
occurs  because  the  low  value  of  a ^  favors  trajectories  that,  first  and  foremost,  match 
the  RCS  measurements,  even  if  they  are  unrealistic  from  a  kinematic  perspective.  For 
example,  during  a  turn,  it  could  be  that  a  noisy  RCS  measurement  happens  to  match 
CV  particles  (whose  bank  angles  are  zero)  better  than  CT  particles  (whose  bank  angles 
are  correct).  This  causes  the  CV  flight  mode  to  increase  in  probability.  A  few  samples 
later,  when  it  becomes  clear  through  RCS  or  Doppler  that  the  target  is  still  turning,  the 
CT  mode  will  return  to  prominence.  This  can  occur  because  Np,  the  PF  sample  size, 
is  finite.  As  such,  the  limited  number  of  CT  mode  particles  may  provide  inadequate 
coverage  of  the  range  of  possible  bank  angles.  Because  we  want  the  particle  filter  to 
operate  as  intended  (i.e.,  without  chattering  between  modes),  we  will  avoid  small  val¬ 
ues  for  ajj.  Instead,  we  select  ajj  =  8  as  a  reasonable  value  that  minimizes  occurrence 
of  the  phenomenon  seen  in  Figure  7.7  while  still  providing  RCS  measurements  with 
enough  fidelity  to  be  useful  for  tracking  and  classification. 
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Table  7.7:  Generic  PF  Tracking  Accuracy  versus  Scattering  Center  Model 
Roll-Roll  Maneuver  Roll-Pitch  Maneuver 


SC  Model 

eRMS  (m) 

Nl 

e-RMS  (m) 

Nl 

1 

86.78 

0 

92.31 

0 

2 

81.06 

0 

93.92 

0 

3 

82.25 

1 

100.84 

2 

4 

85.13 

0 

106.23 

0 

5 

84.99 

1 

102.34 

0 

6 

87.11 

1 

101.15 

0 

7 

83.53 

0 

102.02 

0 

Choice  of  scattering  center  model 

We  conclude  our  experiments  for  a  single  target  of  known  identity  by  investigating  the 
effect  of  our  choice  of  scattering  center  model  on  tracking  performance.  Figure  7.3 
shows  the  bistatic  RCS  for  seven  different  scattering  center  models  with  randomly 
generated  parameters.  In  this  section,  we  wish  to  make  sure  that  the  superior  track¬ 
ing  performance  displayed  by  the  particle  filter  is  not  caused  by  a  fortuitous  choice  of 
scattering  center  model  parameters.  Table  7.7  lists  the  results  of  100  trials  for  each 
combination  of  trajectory  and  scattering  center  model,  run  using  the  generic  particle 
filter  with  Np  =  600  and  —  8.  For  a  given  maneuver  (roll-roll  or  roll-pitch),  the 
target  follows  the  same  flight  path  and  uses  the  same  kinematic  coefficients.  The  only 
difference  is  the  parameters  that  are  used  to  synthesize  the  RCS  measurements.  Com¬ 
paring  the  entries  in  the  table,  we  find  that,  while  SC  1  (the  model  that  we  have  used 
in  previous  sections)  performs  quite  well  on  the  roll-pitch  maneuver,  its  performance 
on  the  roll-roll  maneuver  is  actually  below  average.  More  importantly,  comparing  the 
results  in  Tables  7.2  and  7.7,  we  note  that  the  generic  particle  filter  outperforms  the 
IMM-EKF,  regardless  of  the  choice  of  scattering  center  model.  Thus,  the  improvement 
in  performance  achieved  by  the  particle  filter  is  indeed  related  more  to  its  aerodynami- 
cally  correct  state  model  than  the  specific  nature  of  the  target’s  radar  cross  section. 

7.3  Tracking  Results  for  a  Single  Target  in  Clutter 

In  the  previous  section,  we  began  the  empirical  analysis  of  our  particle  filtering  algo¬ 
rithm  using  the  simplest  possible  scenario:  tracking  a  single  target  of  known  identity. 
Because  we  did  not  allow  false  alarms  or  missed  detections,  data  association  was  not 
an  issue.  In  this  section,  we  consider  the  more  challenging  task  of  tracking  a  single 
target  of  known  identity  in  the  presence  of  clutter  (i.e.,  false  alarms)  and  missed  detec¬ 
tions.  By  continuing  to  assume  that  the  identity  of  the  target  is  known,  any  performance 
degradation  (compared  to  the  results  from  the  previous  section)  can  then  be  attributed 
to  clutter  and  Pr>  <  1.  As  before,  we  will  evaluate  the  performance  of  our  IMM-EKF, 
generic  PF,  and  auxiliary  PF  by  Monte  Carlo  simulation. 

Before  we  can  present  the  experimental  results,  there  is  a  technical  difficulty  that 
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must  be  addressed.  As  discussed  in  Sections  5.3  and  6.3,  we  follow  popular  practice 
and  assume  that  the  number  of  false  alarms  in  scan  k  is  Poisson-distributed  with  mean 
BVk,  where  8  is  the  clutter  density  and  V*  is  the  scan  volume.  In  the  previous  section, 
filter  performance  was  quantified  using  two  figures  of  merit:  the  RMS  error  crms  and 
the  lost  track  count  Nr.  These  metrics  could  be  compared  between  filters  because  the 
simulations  occurred  under  identical  circumstances  (namely,  the  same  two  flight  paths). 
In  this  section,  a  fair  comparison  is  much  more  difficult  to  achieve.  To  see  why  this 
is  so,  we  note  that  the  IMM-EKF  operates  in  a  two-dimensional  measurement  space 
(delay  and  Doppler).  On  the  other  hand,  because  the  particle  filter  incorporates  RCS 
as  a  data  feature,  it  operates  in  a  three-dimensional  measurement  space.  As  such,  it  is 
not  possible  to  simply  use  the  same  value  of  clutter  density  to  insure  a  fair  comparison. 
Instead,  we  need  a  different  metric  to  quantify  the  extent  of  clutter  for  each  trial. 

For  our  IMM-EKF,  the  measurement  zj^  satisfies  the  gate  condition  for  a  given 
target  track  if 


(j)l  C-lyO) 
°k  Jk 


<  2  In 


D 


^8(l-PDPG)2n^W\ 


V 


(7.7) 


where  yj^  is  the  innovation,  yfp  =  zj^  —  S'*  is  its  covariance  matrix, 

and  8  is  the  clutter  density.  Equation  (7.7)  is  derived  by  requiring  that  Pr  (rj^  |zi:* )  > 
Pr(r[°^  |zi:fc)  for  j  0,  using  the  association  probabilities  from  (5.32).  In  words,  a 
measurement  that  is  more  likely  to  have  been  produced  by  the  target  than  a  false  alarm 
is  said  to  satisfy  the  gate  condition.  Any  measurement  that  does  not  satisfy  at  least  one 
target  gate  is  discarded  prior  to  data  association.7  For  simulations  such  as  ours  in  which 
the  number  of  targets  is  known  a  priori,  ungated  false  alarms  have  no  effect  on  track 
accuracy.  As  such,  the  average  number  of  gated  measurements  can  be  used  in  place  of 
8  to  quantify  the  extent  of  clutter.  We  define  this  average  as 

-  EKF  A  total  #  of  gated  measurements 

N‘  = - m  -  D«„c - 1  (7'8) 

where  N);  is  the  number  of  scans,  and  Nmc  is  the  number  of  Monte  Carlo  trials.  The 
number  of  scans  is  (A7*.  —  1)  instead  of  IV*  because  there  is  no  measurement  update  for 
the  first  scan.  (It  is  used  to  initialize  the  filter.) 

While  NgKF  would  seem  to  provide  a  means  of  ensuring  that  the  simulations  for 
each  filter  were  conducted  under  similar  circumstances,  the  need  to  estimate  the  co- 
variance  5*  for  (7.7)  is  problematic  for  the  particle  filter.  First,  we  recall  that  one  of 
the  main  advantages  of  the  particle  filtering  approach  is  the  ability  to  model  highly 
non-Gaussian  densities.  Using  a  single  statistic,  such  as  the  covariance  matrix  5*,  to 
summarize  the  extent  of  a  potentially  multimodal  distribution  negates  much  of  this  ben¬ 
efit.  Second,  if  degeneracy  of  the  sample  weights  results  in  frequent  resampling,  the 
particle  set  may  become  impoverished  (i.e.,  may  contain  few  unique  values).  In  this 
case,  the  estimate  of  S'*  will  be  inaccurate,  which  could  lead  to  further  degradation  if 

7Actually,  in  a  real  implementation,  these  ungated  measurements  would  be  used  to  initiate  new  tracks. 
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the  resulting  gate  becomes  too  small  to  include  the  true  measurement.  For  these  rea¬ 
sons,  particle  filters  often  avoid  gating  altogether  [65].  However,  in  order  to  have  some 
basis  on  which  to  compare  the  performance  of  the  IMM-EKF  and  the  particle  filter,  we 
will  perform  a  crude  type  of  gating  based  upon  (6.30).  In  that  equation,  the  likelihood 
function  in  the  presence  of  clutter  and  missed  detections  was  shown  to  satisfy 


P(zfc|xiJ))  oc 


mk 
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where  mu  is  the  number  of  measurements  received  during  scan  k.  Thus,  we  see  that 
any  measurement  zjj^  whose  likelihood  is  substantially  less  than  /?( 1  —  Pd)/Pd  will 
make  a  negligible  contribution  to  .  With  this  as  motivation,  a  measurement  zj^ 
will  be  said  to  satisfy  the  gate  condition  for  particle  xjj,  if 
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/?(!  -  PpPa) 

Pd 


(7.10) 


where  g  =  0.02  in  our  experiments.  Only  those  measurements  that  satisfy  (7.10) 
will  be  used  in  (7.9)  to  update  the  particle  weight  w k  .  This  gate  definition  has  two 
attractive  properties  for  particle  filters.  First,  it  does  not  rely  upon  an  estimate  of  the 
state  covariance.  Second,  the  gate  threshold  varies  directly  with  8.  Thus,  as  the  number 
of  false  alarms  increases,  so  will  the  gate  threshold.  With  (7.10)  as  the  gate  condition 
for  our  particle  filter,  we  can  now  define  NFF.  Because  the  gate  is  applied  individually 
for  each  particle,  NFF  must  be  averaged  over  the  entire  particle  set,  as  well  as  each 
scan  and  Monte  Carlo  trial, 

-PP  a  total  #  of  gated  measurements 

3  NP(Ni .  —  1  )NMc  ’ 

where  measurements  can  be  counted  multiple  times  if  they  satisfy  (7.10)  for  more  than 
one  particle. 

For  the  rest  of  this  section,  tracking  performance  will  be  quantified  using  crms, 
Nl,  and  either  NFKF  or  NFF  for  various  choices  of  Pd  and  8 ■  As  a  reminder  of  the 
difference  in  dimension  of  the  underlying  measurement  space,  we  will  use  82  to  denote 
the  (two-dimensional)  clutter  density  for  the  IMM-EKF  and  83  to  denote  the  (three- 
dimensional)  density  for  the  particle  filter.8  The  average  gate  counts  will  allow  us  to 
compare  the  extent  of  clutter  between  systems.  However,  because  the  gate  conditions 
for  NFKF  and  NFF  are  different,  this  is  not  a  strict  equivalence. 


7.3.1  Results  for  the  IMM  extended  Kalman  filter 

Our  IMM-EKF  simulations  were  performed  using  the  optimal  set  of  IMM  noise  vari¬ 
ances,  as  determined  in  Section  7.2.1.  We  considered  two  values  for  the  probability 
of  detection:  Pd  =  0.9  and  Pd  =  0.7.  Table  7.8  lists  the  tracking  results  for  sev¬ 
eral  different  clutter  densities  using  the  roll-roll  maneuver.  Note  that  for  /?2  =  0  and 

s/?2  is  unitless;  3 2  has  units  m-2. 
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Table  7.8:  IMM-EKF  Tracking  Accuracy  versus  82  and  Pd 


h 

Pd  =  0.9 

b^ 

II 

O 

eRMS  (m) 

Nl 

GBf] 

eRMS  (m) 

Nl 

iwmm 

0 

157.79 

9 

0.91 

210.62 

34 

0.73 

25 

167.02 

6 

0.93 

189.40 

38 

0.73 

50 

167.63 

17 

0.95 

223.54 

44 

0.74 

100 

175.99 

24 

1.00 

191.93 

56 

0.77 

200 

162.54 

30 

1.05 

212.01 

76 

0.81 

Pd  1,  some  of  the  scans  will  be  empty  (i.e.,  will  not  contain  any  measurements). 
In  these  cases,  the  EKF  state  and  covariance  matrices  are  predicted  to  the  current  time, 
but  there  is  no  measurement  update.  The  results  in  Table  7.8  should  be  compared  to  the 
single-target  results  from  Table  7.2.  In  those  simulations,  the  optimal  choice  of  IMM 
noise  variances  yielded  eRMS  —  132.09  m  and  Nl  =  3  for  the  roll-roll  maneuver. 
When  compared  to  the  results  in  Table  7.8,  the  negative  impact  of  clutter  and  missed 
detections  is  apparent.  Because  the  average  number  of  false  measurements  per  gate  is 
approximately  equal  to  NgKF  —  Pd,  we  find  that  performance  can  be  degraded  by  just 
a  few  false  associations.  As  an  example,  consider  the  (82,  Pd)  =  (200,  0.9)  simula¬ 
tions.  In  this  case,  false  alarms  contained  in  1.05  —  0.9  =  15%  of  the  gates  and  a  10% 
chance  of  missed  detection  were  enough  to  cause  30  out  of  100  tracks  to  be  lost.9 

Overall,  there  are  two  conclusions  that  can  be  drawn  from  Table  7.8  for  our  appli¬ 
cation.  First,  missed  detections  are  more  damaging  to  track  accuracy  than  false  alarms. 
This  can  be  seen  by  comparing  the  first  row  of  the  table  (82  =  0)  to  the  others.  We  find 
that  eRMS  degrades  immediately  for  Pjj  7^  1.  Increasing  82 ,  on  the  other  hand,  mostly 
affects  Nl,  the  number  of  lost  tracks.  For  those  cases  were  eRMS  actually  decreases 
as  82  grows,  we  refer  back  to  the  definition  of  eRMS  in  (7.3).  We  see  that  eRMS  is 
averaged  over  Nmc  ~  Nl  trials.  Thus,  as  the  number  of  lost  tracks  grows  with  82,  so 
will  the  Monte  Carlo  variation  in  eRMS-  Second,  a  modest  change  in  P/>  can  have 
severe  consequences.  While  the  IMM-EKF  still  functions  for  Pd  =  0.9  and  small 
values  of  82,  it  is  almost  unusable  for  Pd  =  0.7,  regardless  of  the  choice  of  82-  T° 
understand  why  this  is  so,  consider  the  track  loss  example  shown  in  Figure  7.8.  In  the 
plot,  the  red  line  represents  the  true  roll-roll  trajectory;  the  blue  line  is  the  IMM-EKF 
estimate.  In  this  example,  missed  detections  at  the  crucial  onset  of  the  first  coordinated 
turn  causes  the  filter  to  miss  the  maneuver  entirely.  Instead,  the  filter  locks  onto  a  series 
of  false  alarms  that  match  its  constant  velocity  mode.  This  demonstrates  that  the  target 
trajectory  can  have  a  substantial  impact  on  filter  performance  in  the  presence  of  clutter 
and  missed  detections.  Thus,  the  results  in  Table  7.8  are  not  necessarily  indicative  of 
the  performance  that  could  be  achieved  against  a  nonmaneuvering  target. 

'This  is  just  an  approximation  because  the  true  measurement,  when  detected,  may  not  always  satisfy  the 
gate  condition.  There  can  also  be  more  than  one  false  alarm  per  gate. 
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Example  of  Track  Loss,  Roll-Roll  Maneuver 


Figure  7.8:  Example  of  track  loss  for  IMM-EKF  due  to  clutter  and  missed  detections. 
For  the  trial  shown,  B->  =  100  and  PD  =0.7. 


Table  7.9:  PF  Tracking  Accuracy  versus  /?3  and  Pd 


ft 

PD  =  0.9 

I? 

II 

o 

e-RMS  (m) 

Nl 

npf 

a 

zrms  (m) 

Nl 

Npf 

t ) 

Generic 

Particle 

Filter 

Np  =  600 

0 

80.76 

3 

0.87 

96.48 

14 

0.66 

2 

88.58 

2 

1.05 

106.09 

3 

0.78 

4 

84.44 

3 

1.27 

106.09 

3 

0.93 

8 

92.09 

5 

1.65 

117.97 

14 

1.25 

16 

101.92 

14 

2.29 

153.12 

21 

1.92 

Auxiliary 

Particle 

Filter 

Np  =  400 

0 

83.12 

1 

0.90 

97.62 

11 

0.72 

2 

94.72 

0 

1.16 

93.78 

4 

0.92 

4 

86.79 

3 

1.39 

95.00 

13 

1.11 

8 

93.07 

6 

1.81 

116.21 

16 

1.47 

16 

133.64 

19 

2.55 

149.02 

27 

2.19 

7.3.2  Results  for  the  particle  filters 

In  this  section,  we  present  results  for  a  generic  particle  filter  and  an  auxiliary  particle 
filter  in  the  presence  of  clutter  and  missed  detections.  As  in  the  previous  section,  sim¬ 
ulations  will  be  performed  using  the  roll-roll  maneuver  with  Pp  =  0.9  and  Pp  =  0.7. 
Because  the  proposal  distribution  of  the  auxiliary  particle  filter  incorporates  the  current 
measurement,  fewer  particles  are  needed  to  achieve  a  desired  accuracy.  As  such,  we 
will  take  Np  =  600  for  the  generic  particle  filter  and  Np  =  400  for  the  auxiliary  par¬ 
ticle  filter.  Table  7.9  includes  the  tracking  accuracies  for  both  types  of  particle  filters 
over  a  range  of  clutter  densities.  The  impact  of  clutter  and  missed  detections  can  be 
determined  by  referring  back  to  Table  7.3  from  Section  7.2.2.  For  those  simulations, 
the  generic  particle  filter  with  Np  =  600  achieved  Bums  —  86.78m  and  Nl  =  0  while 
the  auxiliary  particle  filter  with  Np  =  400  achieved  Bums  —  80.25  m  and  Nl  =  0. 

First,  we  compare  the  clutter  performance  of  both  particle  filters  to  the  results 
achieved  by  the  IMM-EKF.  As  discussed  previously,  our  intent  was  to  choose  values  of 
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B‘2  and  B\i  such  that  Nfj‘ KF  «  F.  Comparing  Tables  7.8  and  7.9,  we  find  that  the  aver¬ 
age  gate  counts  for  the  particle  filters  tend  to  exceed  those  of  the  IMM-EKF.  However, 
because  the  PF  and  IMM-EKF  use  different  gating  conditions,  this  does  not  necessarily 
mean  that  the  particle  filters  are  faced  with  the  more  difficult  task.  As  discussed  pre¬ 
viously,  the  average  gate  counts  should  be  interpreted  qualitatively,  and  to  this  extent, 
we  find  that  the  results  are  worthy  of  comparison.  Considering  crms  and  Nj_  from 
Tables  7.8  and  7.9,  it  is  clear  that  the  particle  filter  is  more  robust  than  the  IMM-EKF 
to  false  alarms  and  missed  detections.  For  Pd  =  0.9,  it  takes  N£f  in  excess  of  2.0 
before  filter  performance  begins  to  degrade  substantially.  This  means  that,  on  average, 
each  particle  is  updated  using  both  the  true  measurement  and  a  false  alarm.  We  also 
find  that  Pjj  =  0.7,  which  rendered  the  IMM-EKF  inoperative,  results  in  substantially 
fewer  lost  tracks  for  the  particle  filter.  To  understand  why  the  particle  filter  outperforms 
the  IMM-EKF  in  the  presence  of  clutter  and  missed  detections,  consider  how  each  algo¬ 
rithm  maintains  its  estimate  of  the  posterior  density.  For  the  extended  Kalman  filter,  the 
posterior  density  is  completely  specified  by  (an  estimate  of)  the  conditional  mean  and 
covariance.  Thus,  a  single  false  measurement  that  shifts  the  conditional  mean  away 
from  its  true  value  during  the  Kalman  update  can  change  the  posterior  significantly. 
The  posterior  density  of  the  particle  filter,  on  the  other  hand,  is  “encoded”  in  the  loca¬ 
tions  and  weights  of  hundreds  of  particles.  Thus,  while  any  single  false  measurement 
might  disturb  a  portion  of  the  particles,  it  is  unlikely  that  the  entire  distribution  would 
be  substantially  distorted.  Instead,  it  would  require  many  high-likelihood  false  alarms 
to  cause  a  significant  change  in  the  entire  particle  set,  something  that  is  statistically  less 
likely. 

Now,  we  compare  the  performance  of  the  generic  particle  filter  with  that  of  the 
auxiliary  particle  filter.  Broadly  speaking,  there  is  little  difference  between  the  perfor¬ 
mance  of  the  filters  for  the  values  of  /f3  and  Pd  that  we  have  considered.  However,  one 
observation  can  be  made.  It  appears  that  the  generic  particle  filter  performs  slightly 
better  than  the  auxiliary  particle  filter  for  the  simulations  with  Pd  =  0.9.  On  the  other 
hand,  the  auxiliary  particle  filter  seems  to  do  slightly  better  when  Pd  =  0.7.  Recall 
that  the  proposal  distribution  of  the  auxiliary  particle  filter  is  designed  to  shift  particles 
into  areas  of  the  state  space  that  will  result  in  high  data  likelihoods.  When  false  mea¬ 
surements  are  present,  this  technique  can  backfire  because  particles  are  then  shifted 
into  areas  of  the  state  space  that  match  the  false  alarms,  but  not  necessarily  the  true 
posterior.  On  the  other  hand,  this  same  shortcoming  allows  the  auxiliary  particle  fil¬ 
ter  to  recover  more  quickly  after  a  missed  detection  by  shifting  particles  during  state 
prediction  back  towards  the  true  measurement,  once  it  reappears.  As  discussed  in  the 
previous  section,  missed  detections  are  more  troublesome  than  false  alarms  when  track¬ 
ing  maneuvering  targets.  As  such,  if  Pd  is  low,  the  APF’s  ability  to  recover  quickly 
from  a  missed  detection  makes  it  the  better  choice.  However,  if  missed  detections  are 
unlikely  because  Pd  is  high,  the  generic  particle  filter  seems  to  be  more  robust  to  false 
measurements. 
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7.4  Classification  Results  for  a  Single  Target 


In  this  section,  we  investigate  the  classification  performance  of  the  particle  filter.  Be¬ 
cause  the  IMM-EKF  cannot  incorporate  RCS  as  a  measurement,  it  is  not  suitable  as 
a  classifier  for  FM-band  passive  radar  applications  and  will  not  be  discussed  in  this 
section.  Instead,  we  will  compare  results  for  the  generic  and  auxiliary  particle  filters. 
For  the  simulations  in  this  section,  we  take  =  0  and  P;>  =  1.  This  will  give  an 
indication  of  the  best  that  an  RCS-based  classifier  could  be  expected  to  achieve.  We 
remove  these  restrictions  when  we  consider  multitarget  tracking  in  the  next  section. 

Before  presenting  our  results,  we  must  first  discuss  the  metric  that  will  be  used  to 
gauge  classification  performance.  Given  the  measurement  sequence  zi:*,  the  maxi¬ 
mum  a  posteriori  (MAP)  estimate  of  target  class  at  scan  k  is 

h  -  argmax  Pr(c(xfc)  =  j|zi:fc),  (7.12) 

where  Nc  is  the  number  of  classes  and  c  is  a  decision  rule  that  extracts  the  discrete 
class  label  from  the  mixed  state  x/, .  While  c/0  is  the  class  that  would  be  declared  by  the 
filter  at  scan  k,  further  insight  can  be  obtained  by  considering  the  entire  collection  of 
probabilities  {Pr(c(xfc)  =  j |zi:*)  :  j  =  1, . . . ,  Nc}.  This  collection  allows  us  to  see 
which  classes  are  most  often  confused.  These  probabilities  can  be  approximated  using 
the  particle  weights. 


Pr(c(xfc)  =j\z1:k)  ,  (7.13) 

i= 1 

where  10,6  =  1  if  a  =  &  and  zero  otherwise.  In  order  to  gauge  the  performance  through¬ 
out  the  maneuver,  we  will  average  each  of  these  probabilities  across  many  scans.  More 
specifically,  we  compute  the  average  class  probability  for  class  j  as 


1 

(Nmc  ~  NL)(Nk  -k0  +  l) 


Nmc-Nl  Nk  N„ 

E  EE# 

n= 1  k=ko  i=  1 


(7.14) 


where  we  have  suppressed  the  particle  weights’  dependence  on  the  simulation  index 
n.  The  index  ko  denotes  the  scan  in  which  averaging  begins.  By  choosing  ko  =  iV*, 
we  have  the  average  a  posteriori  probability  given  the  entire  measurement  sequence 
zi-.Nh  ■  However,  it  is  advantageous  to  choose  a  smaller  value  of  ko  in  order  to  provide 
a  broader  view  of  the  classifier’s  performance.  This  way,  the  average  class  probabil¬ 
ities  are  not  overly  sensitive  to  the  measurement  noise  in  the  last  few  scans.  In  our 
experiments,  we  take  ko  =  11,  which  results  in  averaging  from  t  =  4  to  t  =  30  s. 
Because  all  classes  are  equally  likely  at  t  =  0,  setting  ko  any  smaller  yields  results  that 
are  artificially  influenced  by  the  filter’s  initialization. 

A  target  class  is  completely  specified  in  our  classifier  by  an  RCS  table  and  a  set 
of  aerodynamic  coefficients.  However,  as  discussed  in  Section  7.1.3,  we  do  not  have 
enough  CAD  models  to  use  actual  RCS  tables  in  our  experiments.  Instead,  we  will  use 
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the  scattering  center  model  presented  in  Section  7.1.3  to  generate  synthetic  RCS  data. 
By  varying  the  parameters  of  the  scattering  center  model,  we  can  easily  create  different 
target  classes.  The  seven  classes  that  we  will  use  are  shown  in  Figure  7.3. 

We  have  a  similar  problem  when  it  comes  to  the  aerodynamic  coefficients  required 
by  our  state  model.  Although  we  used  the  actual  coefficients  of  an  F-4E  Phantom  to 
generate  the  roll-roll  and  roll-pitch  trajectories,  we  do  not  have  aerodynamic  coeffi¬ 
cients  for  any  other  real  aircraft.  While  we  could  use  the  F-4E  coefficients  for  all  seven 
classes,  this  would  negate  some  of  the  benefit  of  performing  tracking  and  classification 
jointly  because  variation  in  the  aerodynamic  coefficients  provides  a  secondary  means 
of  class  discrimination.  More  specifically,  if  a  target’s  velocity  and  orientation  can 
be  estimated  through  Doppler  and  RCS,  the  aerodynamic  coefficients  then  specify  the 
ensuing  flight  path  (until  the  next  maneuver).  If  the  target  does  not  follow  this  path, 
either  the  estimates  of  velocity  and  orientation  are  wrong,  or  the  target  does  not  belong 
to  the  hypothesized  class.  For  these  reasons,  the  aerodynamic  coefficients  for  classes 
2  through  7  will  be  randomly  selected  to  approximate  the  characteristics  of  a  fighter- 
sized  plane.  Class  1  will  continue  to  use  the  actual  F-4E  coefficients.  Ground  truth 
trajectories  for  classes  2  through  7  can  then  be  generated  by  driving  the  state  model  for 
each  class  with  the  original  sequence  of  pilot  commands  from  the  roll -roll  maneuver. 10 

7.4.1  Simulations  using  three  classes 

In  this  section,  we  compare  the  performance  of  the  two  particle  filters  using  classes 
1-3  from  Figure  7.3.  As  mentioned  previously,  we  consider  the  roll-roll  maneuver 
with  /?3  =  0  and  Pn  =  1.  The  average  class  probabilities  are  computed  using  (7.14) 
with  ko  =  11  and  Nmc  =  100  (as  always).  The  resulting  probabilities  are  listed 
in  Table  7.10  for  the  generic  particle  filter  and  Table  7.11  for  the  auxiliary  particle 
filter.  We  find  that  the  classification  performance  is  excellent  for  both  filters.  Each 
table  also  lists  the  RMS  error  as  a  way  of  quantifying  the  impact  that  class  uncertainty 
has  on  track  accuracy.  Compared  to  the  results  for  a  single  target  of  known  identity 
(see  Table  7.3),  we  find  that  the  generic  particle  filter  maintains  its  accuracy  while  the 
auxiliary  particle  filter  degrades  slightly.  This  is  not  surprising  because  the  particle  sets 
are  now  split  between  the  three  classes  (although  usually  not  evenly).  In  a  sense,  some 
of  the  resources  that  went  to  estimating  the  position  and  velocity  of  the  target  before 
are  now  being  spent  trying  to  determine  the  correct  class.  For  the  generic  particle  filter 
with  Np  =  600,  there  are  enough  particles  to  spare.  For  the  auxiliary  particle  filter, 
which  used  fewer  particles  in  this  experiment  (Np  =  400),  track  accuracy  is  degraded. 
The  solution  is  simple,  of  course;  we  can  always  increase  Np  if  needed. 

10Recall,  the  pilot  command  set  is  A 9^,T£}.  The  mode  sequence  and  accompanying  inputs 

serve  as  deterministic  but  unknown  inputs  to  the  flight  model.  Thus,  while  all  seven  classes  will  execute 
the  same  roll-roll  maneuver,  the  resulting  trajectories  will  differ  somewhat  because  of  the  variation  in  the 
aerodynamic  coefficients. 
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Table  7.10:  Confusion  Matrix  for  Generic  PF  (Np  =  600) 


Average  Class 
Probability 

e-RMS  (m) 

1 

2 

3 

Correct 

Class 

1 

.987 

.004 

.009 

82.27 

2 

.001 

.999 

.000 

86.32 

3 

.002 

.004 

.994 

88.71 

Table  7.1 1:  Confusion  Matrix  for  Auxiliary  PF  (Np  =  400) 


Average  Class 
Probability 

e-RMS  (m) 

1 

2 

3 

Correct 

Class 

1 

.991 

.007 

.002 

89.82 

2 

.000 

.999 

.000 

95.81 

3 

.002 

.002 

.996 

94.82 

7.4.2  Simulations  using  seven  classes 

In  this  section,  we  consider  the  more  challenging  task  presented  by  a  7-class  problem. 
These  experiments  will  use  scattering  center  models  1-7  from  Figure  7.3  to  generate 
the  synthetic  RCS  measurements  for  each  class.  As  before,  we  take  03  =  0  and  Pd  = 
1.  The  average  class  probabilities  will  also  be  computed  in  the  same  manner  as  in 
the  previous  section.  Because  there  are  over  twice  as  many  classes  as  before,  we  will 
increase  Np  by  200  particles  for  each  filter.  The  average  class  probabilities  for  the 
generic  and  auxiliary  particle  filters  are  presented  in  Tables  7.12  and  7.13,  respectively. 
Overall,  we  find  that  both  particle  filters  perform  quite  well  on  this  task.  This  offers 
hope  that  a  real  implementation  designed  around  actual  RCS  tables  might  perform 
similarly  well.  Ironically,  scattering  center  model  1,  which  we  used  to  generate  the 
synthetic  RCS  data  for  all  the  single-class  experiments,  happens  to  be  the  most  difficult 
to  identify.  It  also  yields  the  largest  RMS  error  of  all  the  classes! 

If  we  compare  the  diagonal  entries  of  the  two  confusion  matrices,  we  find  that 
the  auxiliary  particle  filter  achieves  a  slight  advantage  in  classification  performance. 


Table  7.12:  Confusion  Matrix  for  Generic  PF  (Np  =  800) 


Average  Class  Probability 

e-RMS  (m) 

1 

2 

3 

4 

5 

6 

7 

Correct 

Class 

3 

.685 

.015 

.000 

.081 

.108 

.052 

.059 

96.37 

HI 

.000 

.001 

.000 

.000 

.000 

.001 

78.55 

ill 

.000 

.000 

.989 

.000 

.010 

.000 

.000 

84.03 

4 

.080 

.000 

.000 

.773 

.017 

.020 

.109 

84.83 

HI 

.051 

.002 

.000 

.025 

.882 

.001 

.039 

83.43 

6 

.118 

.000 

.000 

.069 

.040 

.762 

.010 

89.60 

HI 

.054 

.053 

.000 

.040 

.023 

.000 

.830 

91.82 

110 


Table  7.13:  Confusion  Matrix  for  Auxiliary  PF  (Np  =  600) 


Average  Class  Probability 

e-RMS  (m) 

1 

2 

3 

4 

5 

6 

7 

Correct 

Class 

1 

.736 

.011 

.000 

.070 

.057 

.071 

.055 

97.57 

2 

.000 

.989 

.001 

.000 

.000 

.000 

.010 

88.28 

3 

.011 

.003 

.955 

.000 

.019 

.000 

.011 

87.50 

4 

.073 

.000 

.000 

.787 

.014 

.055 

.071 

83.87 

5 

.041 

.003 

.001 

.007 

.890 

.010 

.048 

83.91 

6 

.139 

.000 

.000 

.043 

.003 

.813 

.002 

82.29 

7 

.021 

.035 

.000 

.055 

.052 

.000 

.836 

88.71 

Notably,  for  the  most  difficult  class  to  recognize  (SC  model  1),  the  average  class  prob¬ 
ability  is  over  7%  greater  for  the  auxiliary  particle  filter  than  the  generic  PF  (0.736 
versus  0.685).  The  APF  probability  for  class  6  also  achieves  a  similar  gain  versus 
that  of  the  generic  PF.  Unfortunately,  the  other  diagonal  terms  are  roughly  the  same  or 
even  slightly  smaller  for  the  APF.  Because  we  included  the  auxiliary  particle  filter  in 
this  thesis  in  hopes  that  it  would  provide  better  classification  than  the  generic  PF,  this 
comparison  is  a  bit  disappointing.  Furthermore,  recall  that  the  auxiliary  particle  filter 
requires  just  under  twice  as  much  time  as  the  generic  PF  (for  the  same  value  of  Np). 
Thus,  the  auxiliary  PF  used  in  these  experiments  will  be  almost  50%  slower  than  the 
generic  PF.  Depending  on  the  computational  resources  available,  the  slight  improve¬ 
ment  in  classification  offered  by  the  APF  may  not  be  attractive  enough  to  offset  the 
increase  in  run-time. 


7.5  Multitarget  Tracking  and  Classification  Results 

In  this  final  section,  we  consider  the  full  tracking  scenario.  Each  filter  will  attempt 
to  track  three  maneuvering  targets  in  the  presence  of  clutter  and  missed  detections. 
In  order  to  make  the  data  association  more  challenging,  all  three  target  trajectories 
will  cross,  thereby  placing  the  targets  in  close  proximity.  Finally,  the  class  identities 
of  the  targets  will  be  unknown  to  the  particle  filters  in  order  to  test  their  ability  to 
jointly  track  and  classify  in  the  presence  of  clutter  and  missed  detections.  (Note  that 
the  classification  experiments  in  the  previous  section  were  performed  in  a  clutter-free 
environment  with  zero  probability  of  miss.  Both  restrictions  are  lifted  in  this  section.) 

The  three  trajectories  that  we  will  consider  are  shown  in  the  left  panel  of  Figure  7.9. 
The  green  arrows  are  spaced  4  s  apart  and  point  in  the  direction  of  travel.  Thus,  we 
see  that  from  time  =  12-16  s,  all  three  targets  are  in  close  proximity.  In  the  following 
discussion,  we  will  denote  the  *th  target  as  t,.  For  the  purpose  of  classification,  each 
target  must  be  assigned  a  class  identity.  To  simplify  matters,  t\  will  be  assigned  to 
class  1,  f  2  will  be  assigned  to  class  2,  and  t$  will  be  assigned  to  class  3. 

Recall  from  Section  7.4,  a  class  identity  conveys  two  types  of  information:  (1)  kine¬ 
matic  coefficients  Ac  for  our  flight  model  and  (2)  a  scattering  center  model  for  RCS 
data  generation.  As  discussed  in  the  same  section,  the  kinematic  coefficients  of  class  1 
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Figure  7.9:  Left  panel:  ground  truth  trajectories  for  each  target.  Arrows  are  spaced 
4  s  apart  and  indicate  the  direction  of  travel.  Right  panel:  APF  track  estimates.  Black 
squares  indicate  scans  in  which  the  MAP  class  for  the  given  track  was  incorrect. 


Table  7.14:  IMM-EKF  Multitarget  Tracking  Accuracy 


Target 

&rms  (m) 

Nl 

nekf 

a 

1 

158.76 

9 

0.93 

2 

173.03 

28 

1.03 

3 

149.10 

29 

1.02 

are  those  of  an  F-4E.  The  kinematic  coefficients  for  the  other  classes  are  generated  ran¬ 
domly  to  simulate  the  characteristics  of  fighter-sized  planes.  The  blue  trajectory  flown 
by  1 1  is  identical  to  the  roll-roll  maneuver,  which  we  have  been  using  throughout  this 
chapter.  The  red  trajectory  flown  by  is  generated  by  running  our  state  model  (see 
Figures  6. 4-6. 5)  using  the  pilot  commands  from  the  roll-roll  maneuver  and  the  class-2 
kinematic  coefficients.  The  resulting  flight  path  is  then  rotated  120°  clockwise.  The 
green  trajectory  flown  by  t$  is  generated  in  same  way,  except  the  class-3  kinematic 
coefficients  are  used  and  the  resulting  trajectory  is  then  rotated  120°  counterclockwise. 

We  present  the  results  of  the  IMM-EKF  simulations  first.  We  ran  100  Monte 
Carlo  trials  using  the  optimal  choice  of  {q^cv\  q(CA\  q^CT^},  as  determined  in  Sec¬ 
tion  7.2.1.  The  clutter  density  and  probability  of  detection  were  =  0.9  and  (3%  =  25. 
The  RMS  error  crms  and  lost  track  count  Nl  are  listed  in  Table  7.14  for  each  target. 
Because  the  trajectory  flown  by  t\  is  identical  to  the  roll-roll  maneuver,  the  filter  per¬ 
formance  in  this  case  is  directly  comparable  to  the  (/32  =  25,  Pd  =  0.9)  entry  from 
Table  7.8  (i.e.,  e  ums  =  167.02,  Nl  =  6).  We  find  that  the  performance  against  the 
roll-roll  maneuver  (the  blue  trajectory  in  this  section)  is  relatively  unaffected  by  the 
extension  to  multiple  targets.  However,  this  is  not  the  whole  story.  Consider  the  aver- 


112 


Table  7.15:  PF  Multi  target  Tracking  Accuracy  versus  a ^ 


Target 

II 

CN  g 

b 

00 

II 

eRMS  (m) 

Nl 

npf 

S 

eRMS  (m) 

Nl 

npf 

9 

Generic 
Particle  Filter 
Np  =  800 

1 

91.50 

5 

1.02 

99.57 

4 

1.09 

2 

102.75 

14 

1.05 

118.93 

24 

1.13 

3 

114.93 

16 

1.05 

128.25 

14 

1.14 

Auxiliary 
Particle  Filter 
Np  =  600 

1 

99.44 

5 

1.10 

101.09 

3 

1.21 

2 

113.65 

22 

1.15 

110.85 

22 

1.25 

3 

101.18 

15 

1.16 

147.91 

15 

1.25 

age  gate  counts  in  Table  7.14.  NFKF  =  0.93  for  t±,  precisely  the  same  value  from  the 
single-target  results  in  Table  7.8.  This  means  that  none  of  the  measurements  from  t2 
or  ts  ever  satisfied  the  gate  condition  for  t\  (see  Equation  (7.7)).  As  such,  this  remains 
effectively  a  single-target  experiment  for  the  blue  trajectory. 

Looking  at  the  average  gate  counts  for  1 2  and  £3,  though,  we  find  NFKF  w  1.03,  a 
substantial  difference  from  the  average  gate  count  of  1 1 .  This  means  that  the  gates  for  t-> 
and  £3  overlap  near  the  middle  of  their  trajectories.  While  their  RMS  errors  are  similar 
to  the  value  of  eRMS  achieved  for  t\ ,  the  number  of  lost  tracks  out  of  100  simulations 
is  three  times  larger  than  the  value  of  Nl  for  f  \ .  Thus,  even  though  the  IMM-EKF  uses 
the  JPDA  algorithm  to  perform  data  association  jointly,  tracking  performance  is  still 
degraded  for  tracks  whose  measurement  gates  overlap. 

Now,  we  consider  the  performance  of  our  two  particle  filters.  We  discuss  track 
accuracy  first.  The  probability  of  detection  was  set  to  Pp  =  0.9,  to  match  the  IMM- 
EKF.  However,  as  discussed  in  Section  7.3,  the  difference  in  the  dimensions  of  the 
EKF  and  PF  measurement  spaces  prevent  us  from  using  the  same  value  for  the  clutter 
density.  Instead,  we  chose  /?3  =  2  because  it  yielded  average  gate  counts  comparable 
to  those  of  the  IMM-EKF  (with  82  =  25).  The  class  library  size  was  set  to  seven  in  all 
cases.  Therefore,  at  initialization,  the  particle  filter  for  each  target  was  initialized  with 
approximately  Np/ 7  particles  per  class.  The  right  panel  of  Figure  7.9  shows  an  example 
of  the  filtered  estimates  produced  by  the  auxiliary  particle  filter  for  each  target. 

In  Table  7.15,  tracking  results  are  presented  for  our  generic  PF  and  our  auxiliary 
PF  for  two  different  values  of  Recall  that  o2n  is  the  variance  of  the  independent 
identically  distributed  Gaussian  noise  processes  that  affect  the  in-phase  and  quadra¬ 
ture  channels  of  our  receiver  (see  Section  6.2).  In  Section  7.2.2,  we  considered  several 
different  values  for  this  parameter  and  settled  upon  ojj  =  8.  However,  we  also  ac¬ 
knowledged  that  the  literature  has  little  to  offer  on  this  choice.  In  this  section,  we  will 
return  to  this  issue  in  order  to  determine  the  level  of  performance  that  would  be  achiev¬ 
able  if  (rln  =  4.  It  is  our  expectation  that  a  smaller  value  of  ajj  will  lead  to  improved 
performance. 

Because  the  blue  trajectory  is  identical  to  the  roll-roll  maneuver,  the  =  8  re¬ 
sults  for  t\  from  Table  7.15  can  be  compared  to  the  (83  =  2,  Pp  =  0.9)  entries  from 
Table  7.9.  Recall  that  the  experiments  summarized  in  Table  7.9  assumed  that  the  iden- 
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tity  of  the  (single)  target  was  known.  In  this  section,  that  is  no  longer  the  case,  and 
we  find  that  the  additional  uncertainty  due  to  unknown  target  class  increases  the  RMS 
error  for  both  particle  filters.  It  should  be  noted,  though,  that  part  of  the  degradation 
in  performance  can  be  attributed  to  a  reduction  in  effective  particle  count  per  class. 
For  example,  in  Section  7.2.2,  the  generic  particle  filter  devoted  Np=  600  particles  to  a 
single  class.  In  this  section,  the  generic  PF  shares  800  particles  between  seven  classes. 
Of  course,  in  the  process  of  resampling,  the  distribution  of  particles  across  classes  can 
change  substantially,  and  the  most  likely  classes  will  eventually  end  up  with  signifi¬ 
cantly  more  than  Np/7  particles.  Nonetheless,  in  the  time  it  takes  for  a  class  to  attain  a 
high  likelihood  and  a  greater  share  of  the  sample  set,  estimation  accuracy  is  still  likely 
to  suffer. 

There  are  several  comments  that  should  be  made  concerning  the  results  in  Ta¬ 
ble  7.15.  First,  on  comparing  Tables  7.14  and  7.15,  we  find  that  both  particle  filters 
outperform  the  IMM-EKF  for  all  three  targets,  sometimes  by  a  substantial  margin.  At 
first,  this  might  seem  surprising  on  this  multitarget  task.  After  all,  the  IMM-EKF  uses 
joint  data  association  whereas  our  particle  filters  use  only  (single-target)  probabilistic 
data  association  (see  Section  6.3.2).  First,  to  make  sure  that  the  particle  filter  gates  are 
overlapping,  such  that  joint  data  association  would  be  appropriate,  consider  the  average 
gate  counts  Ng  F.  As  was  the  case  with  the  IMM-EKF,  we  find  that  the  gate  counts  are 
higher  for  t-2  and  t, 3  than  t\ .  This  implies  that  measurements  from  1 2  are  influencing 
the  track  for  i3  and  vice  versa.  So  how  is  it  that  probabilistic  data  association  can  still 
yield  good  performance?  There  are  two  reasons  for  this. 

First,  by  including  RCS  in  the  measurement  vector,  it  is  more  difficult  for  the  return 
from  a  target  to  satisfy  the  PF  gate  criterion  for  the  wrong  track.  In  Table  7.15,  the 
proximity  of  t-2  and  £3  during  the  experiment  results  in  values  of  N£F  that  are  3-5% 
higher  than  those  of  1 1 .  However,  in  Table  7.14,  the  same  trajectories  result  in  values  of 
NgKF  for  t2  and  i3  that  are  10%  higher  than  those  of  t,  1 .  Second,  and  more  importantly, 
we  must  recall  that  the  JPDA  is  a  suboptimal  assignment  algorithm,  even  for  multiple 
targets.  As  discussed  in  Section  5.3.3,  optimal  data  association  requires  that  a  separate 
filter  be  run  for  every  possible  data  association  hypothesis  r^,  which  generally  leads 
to  a  distinctly  non-Gaussian  (multimodal)  posterior  distribution.  The  JPDA,  on  the 
other  hand,  “re-Gaussianizes”  its  estimate  of  the  posterior  at  each  scan.  When  the 
measurement  gates  for  two  or  more  targets  overlap,  a  Gaussian  function  can  become  a 
poor  approximation  to  the  true  posterior.  Thus,  we  have  the  following  trade-off. 

Because  our  particle  filters  can  accurately  represent  multimodal  distributions,  they 
are  better-equiped  to  handle  the  measurement  ambiguity  from  overlapping  gates.  How¬ 
ever,  because  our  PF  data  association  is  not  performed  jointly  across  all  targets,  the 
filters  expect  a  single  persistent  return  per  gate.  When  this  assumption  is  violated, 
performance  can  degrade.  Our  IMM-EKF,  on  the  other  hand,  is  able  to  perform  data 
association  jointly.  Thus,  when  the  measurement  gates  of  £2  and  f3  overlap,  the  filter 
expects  two  persistent  returns  per  gate  for  each  track.  However,  the  IMM-EKF  cannot 
model  this  ambiguity  correctly.  Referring  to  Equation  (5.44),  we  see  that  the  posterior 
covariance  will  be  artificially  inflated  to  reflect  the  measurement  uncertainty.  This  de- 
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creased  confidence  in  the  accuracy  of  its  state  estimate  can  lead  the  EKF  to  becoming 
“distracted”  by  actual  false  returns. 

Comparing  the  performance  of  our  two  particle  filters  to  each  other  in  Table  7.15, 
we  find  that  the  generic  particle  filter  performs  slightly  better  overall.  There  are  two 
potential  reasons  for  this.  First,  note  that  Np  =  800  for  the  generic  PF  and  Np  =  600 
for  the  auxiliary  PF.  These  particle  counts  were  chosen  so  that  the  computational  effort 
of  each  algorithm  was  more  closely  matched.  In  this  7-class  experiment,  600  parti¬ 
cles  may  simply  be  too  few.  Second,  and  more  importantly,  the  proposal  distribution 
of  the  APF  may  be  poorly  matched  to  tracking  in  the  presence  of  clutter  and  missed 
detections.  In  Section  4.3.4,  the  APF  was  shown  to  resample  from  the  previous  states 
{xi*li.}  in  order  to  focus  attention  on  areas  of  the  state  space  that  seem  most  promis¬ 
ing  given  Zfc,  the  current  measurement  vector.  While  this  might  be  a  good  idea  in  a 
clutter-free  environment,  in  the  presence  of  data  uncertainty,  the  APF  can  end  up  wast¬ 
ing  many  of  its  samples  “chasing”  false  returns.  If  we  compare  NFF  for  both  types  of 
particle  filters,  we  find  that  the  average  gate  count  for  the  APF  is  indeed  higher  than 
that  of  the  generic  PF. 

Finally,  comparing  the  tracking  accuracy  versus  the  choice  of  a \  in  Table  7.15,  we 
find  that,  as  expected,  both  particle  filters  perform  better  at  the  lower  noise  variance  (<r^ 
=  4).  There  are  two  reasons  for  this.  First,  the  values  of  NFF  are  smaller  for  of,  =  4  than 

=  8.  As  the  scattering  coefficient  noise  variance  decreases,  it  becomes  more  difficult 
for  false  returns  to  satisfy  the  gate  condition.  The  resulting  reduction  in  data  uncertainty 
leads  to  an  improvement  in  tracking  performance.  Second,  classification  accuracy  also 
improves  as  a „  decreases.  We  will  discuss  the  matter  of  classification  next,  but  now  we 
emphasize  that  ttacking  and  classification  proceed  jointly  in  our  particle  filters.  More 
specifically,  those  particles  whose  class  labels  (  are  incorrect  use  the  wrong  kinematic 
coefficients  during  prediction.  This,  of  course,  leads  to  a  poor  match  with  the  actual 
dynamics  of  the  target.  Therefore,  anything  that  degrades  classification  performance 
will  also  degrade  ttacking  accuracy  (and  vice  versa). 

We  now  consider  the  classification  performance  of  both  particle  filters.  Recall  that, 
for  this  experiment,  t\  is  a  class- 1  aircraft,  t->  is  a  class-2  aircraft,  and  is  a  class-3 
aircraft.  The  right  panel  of  Figure  7.9  sheds  some  light  on  the  classification  process.  In 
this  plot,  a  black  square  is  used  to  identify  those  scans  where  the  MAP  estimates  de¬ 
rived  using  (7.13)  were  incorrect.  Because  each  class  is  equally  likely  at  initialization, 
we  expect  to  find  black  squares  at  the  beginning  of  each  track.  This  is  exactly  what  we 
see  in  Figure  7.9.  Therefore,  to  prevent  initialization  from  diluting  our  classification 
results,  we  ignore  the  first  10  scans  in  computing  the  average  class  probabilities  for 
each  target  (i.e.,  we  take  ko  =  11  in  Equation  (7.14)).  This  is  the  same  approach  that 
was  used  in  Section  7.4. 

The  classification  results  for  our  generic  particle  filter  and  auxiliary  particle  filter 
are  presented  in  Tables  7.16  and  7.17,  respectively.  To  quantify  the  effect  that  data 
uncertainty  has  on  classification  performance,  we  refer  back  to  Tables  7.12  and  7.13. 
Recall,  the  simulations  in  Section  7.4  were  conducted  for  a  single  target  in  a  clutter- 
free  environment  with  zero  probability  of  missed  detection.  However,  the  scattering 
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Table  7.16:  Multitarget  Confusion  Matrix  for  Generic  PF  (Np  =  800) 


Correct 

Class 

Average  Class  Probability 

1 

2 

3 

4 

5 

6 

7 

II 

CN  g 

b 

fi:  1 

.933 

.001 

.000 

.019 

.006 

.031 

.011 

t2:  2 

.000 

.870 

.000 

.012 

.000 

.003 

.114 

£3:  3 

.002 

.000 

.934 

.001 

.014 

.050 

.000 

00 

II 

g 

b 

ti:  1 

.683 

.014 

.001 

.089 

.114 

.050 

.050 

f  2 :  2 

.014 

.855 

.000 

.022 

.001 

.024 

.085 

h:  3 

.001 

.007 

.852 

.010 

.030 

.092 

.009 

Table  7.17:  Multitarget  Confusion  Matrix  for  Auxiliary  PF  (Np  =  600) 


Correct 

Class 

Average  Class  Probability 

1 

2 

3 

4 

5 

6 

7 

II 

g 

b 

h:  1 

.830 

.001 

.000 

.039 

.032 

.057 

.040 

f  2 :  2 

.002 

.898 

.001 

.001 

.000 

.004 

.094 

t3:  3 

.013 

.001 

.858 

.023 

.047 

.058 

.000 

00 

II 

CN  g 

b 

ti:  1 

.726 

.019 

.000 

.055 

.075 

.063 

.061 

f  2 :  2 

.017 

.842 

.002 

.024 

.001 

.018 

.096 

h:3 

.012 

.002 

.808 

.004 

.094 

.079 

.001 

coefficient  noise  variance  was  taken  to  be  cr^  =  8.  Thus,  the  bottom  three  rows  of 
Tables  7.16  and  7.17  may  be  directly  compared  to  the  top  three  rows  of  Tables  7.12 
and  7.13.  For  both  particle  filters,  we  find  that  the  average  probability  of  class  1  is 
unaffected  by  the  measurement  uncertainty.  The  same  is  not  true,  though,  for  classes  2 
and  3  whose  probabilities  decrease  significantly.  While  we  have  already  determined 
that  the  gates  of  t2  and  t3  overlap  for  part  of  their  trajectories,  it  is  unlikely  that  this  is 
the  primary  cause  of  the  degraded  performance.  If  it  were  the  case,  we  would  expect  to 
see  t2  confused  much  more  often  for  class  3  (and  t, 3  for  class  2)  in  Tables  7.16  and  7.17. 
Instead,  it  is  more  likely  that  the  presence  of  false  returns  and  missed  detections  are 
simply  taking  their  toll  on  the  classifier. 

In  terms  of  class  accuracy,  the  auxiliary  PF  does  not  seem  to  offer  an  advantage 
over  the  generic  PF,  which  uses  the  prior  as  its  proposal  distribution.  This  is  not  sur¬ 
prising,  because  we  have  already  theorized  that  the  proposal  distribution  of  the  APF 
may  actually  do  more  harm  than  good  in  the  presence  of  false  returns.  What  is  more 
significant  in  Tables  7.16  and  7.17  is  the  improvement  in  class  accuracy  achieved  by 
reducing  ajj  from  8  to  4.  For  example,  the  average  class  probability  of  class  1  goes 
from  0.683  to  0.933  for  the  generic  PF.  Because  RCS  is  the  primary  feature  for  class 
discrimination,  it  makes  sense  that  more  accurate  RCS  measurements  should  translate 
into  improved  recognition  performance. 1 1 

In  summary,  we  have  demonstrated  that  our  particle  filters  are  able  to  track  and  clas¬ 
sify  multiple  targets  in  the  presence  of  both  clutter  and  missed  detections.  In  addition, 

11  Target  dynamics  via  the  kinematic  coefficients,  A.c  =  {CzJa,  K  £>,  S,m},  would  be  a  sec¬ 

ondary  feature  for  class  discrimination. 
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both  filters  were  shown  to  outperform  the  IMM-EKF.  There  are  three  main  reasons  for 
the  improvement  in  tracking  accuracy  achieved  by  the  particle  filter.  First,  its  ability  to 
model  non-Gaussian  densities  makes  it  robust  in  cluttered  environments.  Second,  its 
ability  to  model  nonlinear  systems  of  almost  arbitrary  complexity  allows  us  to  use  an 
aerodynamically  valid  flight  model.  Third,  its  ability  to  incorporate  RCS  as  a  data  fea¬ 
ture  permits  classification  to  occur  simultaneously,  which  provides  the  class-specific 
kinematic  coefficients  Ac  needed  for  our  flight  model. 
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CHAPTER  8 


CONCLUSIONS 


In  this  thesis,  we  have  presented  a  recursive  Bayesian  formulation  for  joint  tracking  and 
classification.  The  specific  application  we  considered  was  FM  radio-based  passive  air 
surveillance.  For  our  simulations  to  be  as  realistic  as  possible,  we  modeled  false  alarms, 
missed  detections,  and  targets  with  overlapping  measurement  gates.  Most  importantly, 
we  established  a  legitimate  link  between  classification  and  tracking  by  implementing  an 
aerodynamically  valid  flight  model,  which  requires  vehicle-specific  coefficients  such  as 
the  target’s  wing  area,  minimum  drag,  and  mass.  Of  course,  our  classifier  relies  on  the 
tracker  for  position  and  orientation  information,  so  that  RCS  values  can  be  predicted. 
Thus,  there  is  an  important  two-way  exchange  of  information  between  tracker  and 
classifier. 

The  key  feature  that  bridges  the  gap  between  tracking  and  classification  is  radar 
cross  section,  which  we  incorporated  in  our  measurement  vector.  By  modeling  the 
true  deterministic  relationship  that  exists  between  RCS  and  target  aspect,  we  were  able 
to  gain  both  valuable  class  information  and  a  method  for  estimating  target  orienta¬ 
tion.  However,  the  lack  of  a  closed-form  relationship  between  RCS  and  target  aspect 
required  that  we  implement  our  system  using  a  particle  filter. 

In  Chapter  7,  we  demonstrated  that  our  particle  filter-based  system  was  indeed  able 
to  track  and  classify  multiple  targets  in  the  presence  of  both  clutter  and  missed  detec¬ 
tions.  In  addition,  its  tracking  accuracy  was  shown  to  be  superior  to  the  IMM-EKF. 
There  are  three  main  reasons  for  the  improvement  in  particle  filter  performance.  First, 
its  ability  to  model  non-Gaussian  densities  makes  it  robust  in  cluttered  environments. 
Second,  its  ability  to  model  nonlinear  systems  of  almost  arbitrary  complexity  allows  us 
to  use  our  aerodynamically  valid  flight  model.  Third,  its  ability  to  incorporate  RCS  as 
a  data  feature  makes  our  entire  joint  approach  feasible. 

The  one  area  where  our  implementation  did  not  excel  was  that  of  computational 
complexity.  In  our  simulations,  our  particle  filter  was  found  to  be  two  orders  of  mag¬ 
nitude  slower  than  our  IMM-EKF.  As  such,  a  future  area  of  research  could  concentrate 
on  ways  to  achieve  the  same  performance  with  fewer  particles.  For  example,  Np  could 
be  allowed  to  vary  based  upon  the  effective  sample  size  (or  an  estimate  of  it).  A  second 
possible  area  for  future  research  involves  the  issue  of  robustness  during  classification. 
If  an  unusually  noisy  RCS  measurement  is  received,  particles  corresponding  to  the  true 
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class  may  score  poorly  during  the  measurement  update.  If,  during  resampling,  many 
of  them  are  eliminated,  the  filter  may  then  diverge.  It  would  be  useful  to  guard  against 
this  scenario,  possibly  by  repopulating  classes  that  have  been  eliminated.  However, 
care  would  have  to  be  taken  that  the  sample  weights  were  adjusted  appropriately  so 
that  a  valid  estimate  of  the  posterior  was  maintained. 
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APPENDIX  A 


DEGENERACY  OF 
IMPORTANCE  WEIGHTS 


In  this  appendix,  we  provide  a  proof  for  Lemma  3.  We  show  that  the  unconditional 
variance  of  the  importance  weights  from  the  SIS  algorithm  cannot  decrease  with  time. 1 
The  significance  of  this  result  is  that,  practically  speaking,  there  is  no  way  to  avoid  the 
problem  of  degeneracy  in  sequential  importance  sampling.  Thus,  without  resampling, 
the  importance  weights  will  degrade  until  eventually  wk  a  0  for  all  but  a  single  value 
of  i. 

Comparing  Equations  (4.10)  and  (4.14),  we  have  w*k^  =  /p(z.\:k)-  When 

implementing  the  SIS  algorithm,  the  true  importance  weights  {u>^}  are  hardly  ever 
used  because  they  require  evaluation  of  p{ zi:*).  However,  because  zi is  considered 
random  in  this  proof,  we  must  proceed  with  the  true  weights.  Also,  for  simplicity,  we 
suppress  the  superscript  i  for  both  wk  and  the  imputed  state  sequence  x0.k  in  what 
follows.  The  proof  is  combined  from  [53,78]. 

We  begin  by  showing  that  wk  is  a  martingale  in  k.  Define  the  auxiliary  distribution 

Z*|X0:*-1,  =  7r(xfc  |X():fc-l ,  Zl:k)  nfek  |X():fc-l ,  Zl:fc-1  )  (A.l) 

=  7r(x*|x*_.i,z*)p(z*|zi:*,_i).  (A. 2) 

In  words,  n (x* ,  z k  |xo:fc_i ,  Zi-.k-i )  is  the  distribution  from  which  the  current  state  and 
measurement  are  drawn,  conditioned  on  the  previous  state  and  measurement  histories. 
In  going  from  (A.l)  to  (A. 2),  we  use  the  fact  that  the  measurement  z k  is  not  drawn  from 
our  proposal  distribution.  Rather,  it  is  drawn  from  the  true  distribution  governing  the 
observation  process.  Furthermore,  because  the  state  sequence  x[.^,_1  (where  we  have 
suppressed  the  superscript  i  in  (A.l))  is  imputed  from  zi:*_i,  and  xo  is  independent 
of  the  measurement  process,  we  havep(zfc|xo:fc_i,Zi:fc_i)  =  p(zfc|zi:&_i).  Applying 

1  This  statement  applies  in  a  stochastic  sense,  where  the  unconditional  variance  is  taken  with  the  measure¬ 
ment  sequence  zi-k  treated  as  random. 
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the  definition  in  (4.10),  we  have 


f  f  P(zi:fc  X0:fc)p(x0:fc) 

=  /  w  ,  X  7r(x*,Z*  x0:*_l; 

JJ  P(Zl:fc)7r(X0:fc|Zl:fc) 

_  *  f  f  p(zfc|xfc)p(xfc|xfc_i) 

!  Zi:^— i  )dx.kdzk 

JJ  p{zk  |zl:fe— 1 )  |xfe— 1 7  zfc) 

X  7r(x&  ,  Z& |xo;fe— 1  ?  Zij/j—i 

(A3) 

=  w*k-i  J  J  p(zk\xk)  p(-xk\xk-!)dxkdzk 

(A.4) 

=  w*k-i  JJ p(xfc,zfc|xfc_i)<ixfc<izfc 

(A. 5) 

♦ 

=  «'*-!■ 

(A. 6) 

To  arrive  at  (A. 3),  we  used  the  fact  that  X():k  is  Markov  and  zl:k  is  conditionally  in¬ 
dependent  given  xo:fc.  This  allowed  us  to  extract  the  previous  weight  vfk_v  from  the 
integral.  We  also  applied  (4.25)  in  order  to  factor  the  proposal  distribution  n(xo-k \zi  :/,-)• 
Substitution  of  the  definition  of  the  auxiliary  distribution  (A. 2)  then  yields  (A. 4).  There¬ 
fore,  we  have  shown  that  n:J.  is  a  martingale  in  k. 

Using  the  fact  that  is  a  martingale  and  the  variance  decomposition 

var[X]  =  var  [E[X\ Y]]  +  E  [var[X|U]] ,  (A.7) 

we  can  now  prove  the  lemma. 

=  varw(xo;Jl_liB1;Jk_l)[Ejf[i«;|xd:i^i,zi:t_1]]  (A.8) 

=  var T(X0:fc,z1:fe)K]  -  E[var#K|x0:fc_i,Zi:fc_i]]  (A.9) 
<  varT(x0;fciZl;fe)[u4],  (A- 10) 

where  we  used  (A. 6)  to  obtain  (A.8),  and  the  variance  decomposition  (A.7)  to  obtain 
(A.9).  Thus,  we  have  obtained  the  result  that  the  unconditional  variance  of  tvj,  cannot 
decrease  with  k.  Of  course,  for  a  given  experiment,  the  measurement  sequence  zk:k  is 
fixed,  so  we  might  wonder  how  the  lemma  should  be  interpreted.  Using  the  variance 
decomposition  once  again,  we  have 

var  M  =  var[E[io*|zi:*]]  +  E  [var[io*|zi:fc]].  (A.ll) 


Then,  because 

EN!lzi:*]  =  [  p(X°:fclZl:fc)  7r(x0:fc|zi:fc)dx0:fc  =  1,  (A.12) 

J  7r(X0:fe  |Zl:fc) 

substitution  of  (A.12)  into  (A.ll)  yields  E[var[t/;^ |zi:*]]  =  var[tu^].  Therefore,  for  a 
given  measurement  sequence,  the  conditional  variance  of  the  true  importance  weights 
will  tend  to  increase  over  time  (although  not  strictly  so),  resulting  in  the  degeneracy  of 
the  posterior  estimates. 
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APPENDIX  B 


DERIVATION  OF  EFFECTIVE 
SAMPLE  SIZE 


In  this  appendix,  we  derive  the  formula  for  the  effective  sample  size.  Although  many 
sequential  importance  sampling  algorithms  use  an  estimate  of  Neff  to  determine  when 
to  resample,  a  complete  derivation  of  Neff  is  often  neglected.  The  following  derivation 
is  based  upon  the  discussion  in  [48].  For  notational  convenience,  we  will  suppress  the 
dependence  on  the  time  index  k. 

Consider  the  state  vector  x  ~  p(x|z).  We  wish  to  estimate 

Ms  =Epb(x)|z] 


by  Monte  Carlo  integration,  where  it  is  assumed  that  varp[(?(x)|z]  <  oo.  If  we  are 
unable  to  draw  samples  directly  from  p(xjz),  we  can  estimate  fi.g  using  importance 
sampling.  Specifically,  let  7r(x|z)  denote  the  importance  density  from  which  we  draw 
Np  independent  samples  {x^,  x(2)  x1^)}.  Applying  (4.15)  from  Section  4.1.3, 


A  Etfi  w(i)3(x(i)) 

P 9  m 

Ej= i  w{j> 


ZZpi  w'VgjxW) 

z7=i 


(B.l) 


provides  an  asymptotically  unbiased  estimate  of  //fl  (under  mild  restrictions).1  The 
second  equality  in  (B.l)  follows  from  Equations  (4.10)  and  (4.14),  which  yield  w = 
p(z)w*(l\  Note  that  the  equivalence  between  the  two  estimates  in  (B.l)  only  makes 
sense  for  a  fixed  realization  of  the  measurement  process  (i.e.,  if  z  is  not  random).  In 
this  case,  the  true  importance  weights  {u’**-*-1}  and  the  unnormalized  weights  {«/’)} 
differ  only  by  a  constant.  While  the  first  form  of  fig  would  be  used  in  practice,  the 
derivation  of  Nejf  is  simpler  if  we  use  the  second  form. 

We  wish  to  choose  Np  large  enough  so  that  the  random  variation  created  by  our 
Monte  Carlo  approximation  has  a  negligible  effect  on  the  fidelity  of  our  estimate  of 
Hg.  To  accomplish  this,  we  require  that  varT[/l9|z]  be  small  relative  to  the  true  vari- 

1  In  this  section,  we  adopt  a  more  compact  notation  than  Section  4.1.3.  Specifically,  we  replace  1(g)  with 
/ig  and  Jjv  (g)  with  p,g,  thereby  suppressing  the  dependence  on  Np  because  the  sample  size  will  not  vary 
in  the  current  discussion. 
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ance  varp[g(x)|z],  as  suggested  in  [53],  We  begin  by  finding  an  approximation  for  the 
variance  of  ftg. 

Let  yW  =  w*^ An  approximation  for  the  variance  of  (ig  can  be  found 
by  replacing  the  quotient  in  (B.l)  with  a  first-order  Taylor  series  approximation  in  the 
random  variables  w*^\ . . .  ,w*(Np\  y^, . . .  ,y^Np\  In  the  statistics  literature,  this 
approximation  technique  is  referred  to  as  the  delta  method.  The  variance  of  the  result¬ 
ing  linearized  function  will  have  terms  up  to  second  order  only.  Before  proceeding,  for 
i  =  1, . . . ,  Np,  we  note  that 

7r(xjz)dx=  /p(x|z)d*  =  l,  (B.2) 

Ejr[yW|z]=  [  g(x)  7r(x|z)  rfx  =  f  g(x)  p(x\z)  dx  =  ng,  (B.3) 

J  7r(x|zj  J 


\w 


*(*)| 


-I 


p(x  jz) 

7r(x|z) 


where  we  assume  that  the  support  of  p(x|z)  is  a  subset  of  the  support  of  7r(x|z).  We 
now  linearize  fig  about  the  expected  values  of  y^}^. 


u  -  y 


e;=i 

£&  y(i) 


Ej= r  ™*(j) 


+ 


y(i,=M9I 


E.E  w*(j) 


EE  y(i) 


JV„ 


*(i)=l  *=1 


JV„ 


y(l,=Ma>  i=l 


E(»’(i)  -  !) 


iVy  JVj, 


*(*) 


!)• 


(B.4) 


i—1 


i—1 


We  will  denote  the  linearized  estimate  in  (B.4)  as  Because  E„.[yW|z]  =  ng 

and  Ew[io*W|z]  =  1  for  all  i,  we  immediately  have  [/is  ;,„|z]  =  fj,g.  To  find 
Ejr[y2g  |z],  we  note  that,  because  the  sequence  {xW}  is  independent  and  identically 
distributed  (iid),  the  sequences  {tu*W  }  and  {y(*) }  are  also  iid.  Furthermore,  and 
y(d)  are  independent  for  i  j.  The  latter  is  not  true  for  i  =  j ,  though,  because  w * ‘ l> 
and  y  W  are  then  both  functions  of  the  same  random  variable  x^ .  We  can  now  show 
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Ett [/5 „  /,„|z]  —  E „ 


Np  Np 

A*«+ 

iV©  •  ,  iV®  . 

^  2=1  ^  2=1 


=  A*2  +  ^EE-[(yw-^)aw 


p  i= i 

^£E.[(y«-ps)(tu*«-l)|z] 
p  i=i 

+  ^Eh,[(M.*W-1)21z\ 

p  i=l 


(B.5) 


where  we  have  used  the  independence  of  the  various  random  variables  to  eliminate  all 
double  sums.  Because  {io*W}  and  {y  W }  are  identically  distributed  sequences,  (B.5) 
can  be  expressed  more  simply  as 


E71 -[fig, tin 


lz]  =  +  ^var^ylz]  -  ^cov^y, 


w  lz 


,  Vg 

+  «:vai- 


w  z 


(B.6) 


Combining  (B.6)  with  E w[/tPl/,-„|z]  =  ng  yields  the  desired  approximation. 


var^lz]  =  E,^  Jz]  -  E2w[/j,gjin\z], 

«  ^(var.[y!z]  -  2yuscovjr[y,  w*  |z]  +  /t^var^fw*  |z]) .  (B.7) 

Our  linearized  approximation  indicates  that  if  all  the  terms  in  (B.7)  are  finite,  the 
Monte  Carlo  variation  in  fig  should  decrease  as  N~l.  Unfortunately,  because  y  = 
wj*(x)g(x),  our  approximation  for  \axn[jllg\z\  is  dependent  on  the  choice  of  the  func¬ 
tion  g.  Ideally,  we  would  like  to  derive  an  effective  sample  size  that  could  be  applied 
over  a  broad  choice  of  g.  We  also  wish  to  take  all  expectations  involving  g(x)  with  re¬ 
spect  to  the  true  density  p(x|z)  and  not  the  importance  density  7r(x|z).  To  accomplish 
both  of  these,  we  need  to  make  a  further  approximation. 

We  begin  by  expanding  var^  [y  jz]  from  (B.7).  To  simplify  notation,  we  also  replace 
fl(x)  with  g. 


var7T  [y  |z]  =  e„ 


—  F 

—  *-“n 


p(x|z) 


7t(x|z) 
=  Ep[g2w*\z] 


7t(x|z) 

52(x) 


-El 


p(x|z)  2 


P(x)z) 

Ltt(x|z) 

Epb(x)|z] 


3(x) 


»r 


(B.8) 


We  now  replace  g2w*  by  a  second-order  Taylor  series  approximation  about  the  ex¬ 
pected  values  of  w*  and  g  with  respect  to  p(x|z).  To  simplify  notation,  define  //,,,  = 
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Ep[w;* |z].  Note  that,  while  E^to* \z]  =  1,  the  same  is  not  generally  true  for  fj,w. 

g2w*  R*  g2w*\  s=ms  +  2gw*\  g=ng  •  ( g  -  ng )  +  g2 1  •  ( w *  -  gw) 

+  \  (2w*L*=n„  •  (3 -Ms)2  +  2-  2g\g=ilg  ■  (. g-fig)(w *  -/*«,)) 

—  MgMw  T  2fig fiw (9  Ms)  "B  Ms (w  Mw)  Mw (3  Ms)* 

+  2Ms(3-Ms)(^*  ~Mt«)-  (B.9) 


Taking  the  conditional  expectation  of  the  second-order  approximation  in  (B.9)  with 
respect  to  the  true  density  and  substituting  the  result  back  into  (B.8)  yields 


var,r[y|z]  w  / i2gnw  +  M»varp[g|z]  +  2/tscovp[5,  w* |z]  -  g2.  (B.10) 

Thus,  we  have  succeeded  in  approximating  var^[y|z]  using  only  expectations  taken 
with  respect  to  p(x|z),  the  true  density.  We  now  accomplish  the  same  for  the  second 
term  from  (B.7). 


covw[y,iu*|z]  =Ejr[ytu*|z]  -  E^[y|z]  E^iu’lz] 


( p(x|z)  y 

p(x|z) 

■ 

5Vtt(x|z)  J 

z 

E-tt 

9  7r(xjz) 

z 

=  Ep[gw*\z]  -  n9  ■  1 
=  covp[g,w*\z]  +  HgHw  ~  Ms- 


>(x|z) 

_7r(xjz) 


(B.l  1) 


To  arrive  at  the  final  result,  we  substitute  (B.10)  and  (B.l  1)  back  into  Equation  (B.7) 
and  combine  terms. 

varj,-  \fig\z\  w  -^-(var^yjz]  -  2/LtgCOV^y,  w*  |z]  +  MgVar^tu*  |z]  ) 

~  (M»varp[p|z]  -  +  Mg  +  Mg var,, [w*  |z]j 

=  -^(ET[u;*2|z]varp[p|z]  +  /^var^iu* |z]  -  n2g  (E„[w*2\z]  -  l) 

(B.12) 


In  (B.12),  we  used  the  result  that  En[w*2\z]  =  Ep[wj*|z]  =  /iw.  Because  E3r[io*|z]  = 
1,  we  recognize  that  (E^ [ir;*2  |z]  —  1)  =  var^Mt* \z].  Thus,  the  second  and  third  terms 
in  (B.12)  cancel,  leaving  us  with 

var^/iglz]  rj  -^-E7r[tn*2|z]varp[p|z], 

=  ^varp[t/|z](var7r[u>*|z]  +  l).  (B.l  3) 

Recall,  we  wish  to  choose  Np  such  that  var^/iglz]  is  small  relative  to  the  true 
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variance  varp[<?jz].  Thus,  the  ratio 


var„-[/is|z]  ^  varp[5lz](var7r[ttt*|z]  +  l)  var^w*  |z]  +  1 
varp[5jz]  ~  varp[5jz]  Np 


should  be  a  small  number,  say  e.  The  effective  sample  size  Nejj  is  defined  as  the  re¬ 
ciprocal  of  the  approximation  in  (B.14).  Then,  we  have  the  following  (approximate) 
duality: 

var^lz]  A  Np  1 

varp[5|z]  eff  varw[io*|z]  + 1  e 

Thus,  if  we  wanted  the  Monte  Carlo  variation  to  be  less  than  1%  of  the  variance  of  g , 
we  should  require  that  Nejf  >  100  after  each  measurement  update. 

The  use  of  Nejf  as  a  monitor  of  filter  performance  relies  on  the  validity  of  the  two 
Taylor  series  approximations  used  in  the  derivation.  It  is  straightforward  to  show  that 
the  residual  in  the  approximation  used  in  (B.10)  is  Ep[(w*  —  /j,w)(g  —  ng)2\z].  Thus,  the 
use  of  Neff( or  Neg)  as  an  indicator  of  the  ratio  var^f/i g  |z]/varp[p|z]  can  be  misleading 
if  g(x)  differs  substantially  from  fj,g  in  regions  of  the  state  space  where  p(xjz)  is 
significant.  However,  this  approximation  does  have  the  benefit  of  yielding  an  effective 
sample  size  Neg  that  is  independent  of  the  function  g. 
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